Background

neoadjuvant chemotherapy and immunotherapy (NAC) is considered the standard regimen for patients with locally advanced breast cancer and is increasingly being used in patients with early-stage breast cancer. Its main goal is to decrease tumor size [1]. By decreasing tumor size, NAC may enable patients, who would otherwise undergo mastectomy, to be treated with breast-conserving therapy. In patients initially scheduled for breast-conserving therapy a smaller lumpectomy might be performed, potentially resulting in improved cosmesis. Furthermore, NAC allows in vivo assessment of tumor response and therefore chemosensitivity. Individual responses to NAC vary widely, depending on molecular subtype (i.e. estrogen receptor (ER) negative and human epidermal growth factor receptor 2 (HER2) positive tumors respond better) [2], tumor size [3] and treatment regimen [4,5,6].

In order to achieve the maximum surgical advantage from NAC, it is essential that tumor response and residual tumor can be evaluated correctly prior to surgery. Studies show that magnetic resonance imaging (MRI) is the most accurate in determining residual disease after NAC compared to physical examination, mammography and ultrasound [7]. Unfortunately, MRI might both overestimate or underestimate residual tumor size. The overall loss of vital tumor cells might not always be reflected by a reduction in tumor diameter as fibrous stroma might persist and even be enhanced on MRI [8]. Tumor size assessment itself on MRI might also be challenging, as NAC causes various histopathological changes in tumor cellularity, causing some tumors to show concentric shrinkage patterns, while others may crumble (“fragmentation”) into scattered islands of tumor cells. In the latter case, a response is present, but this might not be expressed in simply measuring tumor size, as these individual scattered foci cannot be measured independently on MRI.

Previous research shows that triple negative breast tumors regress significantly more often as a shrinking mass than HER2 positive and ER positive/HER2 negative tumors [9]. In addition, Kim et al. demonstrated that there is a significant difference in MRI-based response patterns after NAC between pathological responders and non-responders [8]. However, these studies analyzed response patterns after completion of NAC. At this point, changes in treatment regimen are no longer possible. Furthermore, the ACRIN study suggests that MRI early in NAC treatment is a stronger predictor of pathological response than MRI after NAC [10]. Hence, studying the association between response patterns on MRI during NAC and final histopathological response (when switching to a cross-resistant NAC in cases with poor (predicted) response might still be possible) could have more clinical implications. To our knowledge, no study has tested correlation between MRI-based response patterns halfway through NAC and pathological response.

Therefore, the main goal of this study was to analyze MRI-based response patterns halfway through NAC and to investigate their role as an early therapy response predictor. Secondary goals were to compare the predictive value of MRI-based response patterns measured halfway through NAC to after NAC and to compare preoperative tumor diameter on breast MRI with tumor size on pathology assessment to evaluate residual tumor assessment in different MRI-based response patterns. Finally, we evaluated interobserver agreement for assessment of MRI-based response patterns and tumor diameter.

Methods

Patient selection

We included all consecutive patients who were treated with NAC for histologically proven primary invasive breast cancer between January 2012 and June 2015, and in whom tumor response was monitored with MRI in the Maastricht University Medical Center+ (MUMC+). In this hospital, standard response monitoring with MRI was performed before, halfway through and after NAC. MRI after NAC was often not performed in the case of complete radiological response halfway through NAC or in the case of mastectomy. Patients who underwent surgery after NAC and underwent at least two MRI examinations were included in this study, provided that the first MRI was performed at baseline, i.e. prior to NAC, and the second MRI after completion of at least three cycles of chemotherapy. Exclusion criteria were unknown ER, progesterone receptor (PR), or HER2 status prior to NAC, previous ipsilateral breast surgery, previous systemic treatment because of contralateral breast cancer and presence of distant metastasis at time of diagnosis.

MRI protocol

Breast MRI was performed on a 1.5 T scanner using a dedicated bilateral 16-channel breast coil (Philips Healthcare, Best, The Netherlands). As contrast agent, gadobutrol (Gadovist®, Bayer Health Care, Germany) was automatically injected through a catheter in the antecubital vein at 0.1 mmol/kg body weight, followed by a saline flush. The imaging aprotocol consisted of two-dimensional T2-weighted images without fat suppresion, dynamic contrast-enhanced fat-saturated T1-weighted images using gadobutrol as a contrast agent, and diffusion weighted imaging (DWI). Imaging parameters can be found in Additional file 1.

Imaging analysis and tumor response pattern assessment on MRI

Two experienced breast radiologists (JBH and RMM), with 7 and 12 years of experience respectively, independently reviewed all breast MRI scans. They were blinded to pathological tumor characteristics and pathological outcome after surgery.

MRI-based response patterns of breast carcinomas during and after NAC were classified into six categories adapted from the classification suggested by Kim et al. [8] (Fig. 1): type 0 (complete radiologic response); type 2 (concentric shrinkage > 3 mm without surrounding lesions); type 2 (crumbling: shrinkage with residual multinodular lesions); type 3 (diffuse contrast enhancement in whole quadrants); type 4 (stable disease, i.e. no response, shrinkage < 3 mm or increase < 3 mm); type 5 (progressive disease, i.e. increase in tumor size > 3 mm or new lesions). A short introduction about the MRI-based response patterns and a test case were provided for both readers. Furthermore, tumor size was determined by measuring the largest diameter of the largest breast lesion on the T1-weighted MRI sequence at peak enhancement (i.e. first dynamic phase after contrast injection) in one view. Readers were allowed to use multiplanar reconstructions to assess the largest tumor diameter.

Fig. 1
figure 1

Magnetic resonance imaging (MRI)-based response patterns of breast carcinomas on breast MRI halfway through and after neoadjuvant chemotherapy

Treatment

All patients received systemic treatment and underwent surgery at the MUMC+. NAC treatment consisted of two possible regimens. All patients with HER2-negative tumors received six cycles of docetaxel, doxorubicin and cyclophosphamide. HER2-positive tumors were treated with four cycles of doxorubicin and cyclophosphamide followed by four cycles of docetaxel and trastuzumab. After NAC, breast conserving therapy or mastectomy and surgery of the ipsilateral axilla (sentinel lymph node biopsy in the case of node-negative (N0) and axillary lymph node dissection in the case of N+) were performed.

Histopathological assessment

All pre-treatment core biopsies and post-treatment surgical specimens were routinely processed. Histopathological analyses were performed by an experienced breast pathologist in accordance with our national breast cancer guideline at the time of diagnosis [11].

Pre-treatment core biopsies were used for grading (according to the modified Bloom-Richardson grading system) and determining receptor status of the tumor. Hormone receptor status, i.e. ER and PR status, was determined by immunohistochemical evaluation and interpreted according to national guidelines in which > 10% of tumor staining is used as a positive cutoff. HER2 status was determined using fluorescence in situ hybridization (FISH) analysis to detect gene amplification in biopsied tissue and was analyzed according to the ASCO CAP guidelines [12]. Tumors were stratified into molecular subtypes based on immunohistochemical evaluation and FISH. Hormone receptor status was considered positive (ER+) when ER and/or PR status was positive and negative (ER-) if both were negative. There were four molecular subtypes: ER+/HER2-, ER+/HER2+, ER-/HER2+ and ER-/HER2-.

Histopathological measurement of residual tumor size, which is considered to be the gold standard, was performed in fresh tissue and correlation was tested microscopically in formalin-fixed, paraffin-embedded tissue. This assessment only included invasive foci, not ductal carcinoma in situ (DCIS). DCIS was measured separately. Pathological dimensions were determined using the longest diameter of the residual tumor or in the case of multifocal disease, the primary index tumor. Thereafter, specimens were fixed with formalin.

Histopathological response of the tumor to treatment was evaluated based on reduction of tumor cellularity, using the Pinder classification (Table 1). Pathological complete response (pCR) was defined as absence of macroscopic and microscopic evidence of invasive tumor and absence of ductal carcinoma in situ (DCIS).

Table 1 Pinder classification

For this study, patients with Pinder classification 2iii or 3 (i.e. < 50% or no regression in tumor cells) were categorized as non-responders, while patients with Pinder classification 1i–2ii (i.e. ≥ 50% regression in tumor cells) were classified as pathological responders.

Statistical analysis

Pearson’s correlation coefficient was used to test correlation between the MRI-based response patterns (halfway through and after NAC) and pathological response after NAC. In case multifocal disease was present, the lesion with the largest dimensions on baseline MRI, considered to be the primary index tumor, was included for statistical analysis. Interobserver agreement between both readers classifying the response according to the six MRI-based patterns was calculated with Cohen’s Kappa [13]. The distribution of the MRI-based response patterns in the different breast cancer subtypes was mapped. Mean tumor size and agreement between both readers was calculated. Pearson’s correlation coefficient was also used to test correlation between tumor size on MRI after NAC and pathologically assessed tumor size (gold standard). When a difference of more than 5 mm between both measurement techniques or between both readers was observed, this was considered to be clinically relevant. A p value <0.05 was considered to be statistically significant. All analyses were performed using Statistical Package for the Social Sciences (SPSS), version 22.0 (IBM Corporation, Armonk, NY, USA).

Results

A total of 76 patients with 80 primary breast tumors (4 bilateral) were included. All patients underwent a breast MRI exam before and halfway through NAC; 57 patients also underwent a breast MRI exam after completion of NAC. Baseline characteristics are shown in Table 2. Mean age was 53 years (range 29–72). Most tumors (89%) were classified as invasive carcinoma of no special type (NST) and 11% were lobular carcinomas. Considering receptor status, most tumors (60%) were ER/PR positive and HER2 negative (Table 2).

Table 2 Baseline characteristics

In ten tumors pathological tumor response was assessed with the Miller and Payne grading system and in five of them this classification could not be converted to the Pinder classification. Therefore, these five patients could not be included in any pathological response analyses, but were included in the evaluation of residual tumor size. Furthermore, since reader 1 did not classify any tumors as showing diffuse enhancement (type 3), most analysis did not include the type 3 response pattern.

Baseline MRI - tumor diameter and interobserver agreement

Mean baseline tumor diameter was 30 mm (range 11–76 mm) according to reader 1 and 32 mm (range 12–118 mm) according to reader 2. At baseline tumor diameter differed more than 5 mm between both readers in 28 tumors (35%), but in only 5 of them (6%) this difference resulted in a difference in clinical tumor stage

MRI halfway through NAC - response patterns and interobserver agreement

Figure 2 shows an example of a type 1 response (concentrically shrinking tumor) and of a type 2 response (crumbling tumor).

Fig. 2
figure 2

Example of a tumor that shrinks concentrically: magnetic resonance imaging (MRI) before neoadjuvant chemotherapy and immunotherapy (NAC) (a) and halfway through NAC (b); and a tumor that crumbles: MRI before NAC (c) and halfway through NAC (d)

As shown in Tables 3 and 5, tumors showing a type-0 response (complete radiologic response) on MRI halfway through NAC have the best pathologic tumor response (90% had > 50% tumor reduction and 83% had pCR).

Table 3 MRI-based response patterns of breast carcinomas on breast MRI halfway through NAC and pathological response per MRI-based response pattern

Patients with a type-2 response (crumbling tumors) less often had > 50% tumor reduction (65% vs 78%) and less often had pCR (14% vs 26%) than patients with tumors with a type 1 response (concentric shrinking) halfway through NAC. Similar rates were seen for tumor diameter reduction > 50% in patients with type-1 and a type-2 response halfway through NAC (40% vs 37%).

There was weak but significant correlation between the MRI-based response patterns halfway through NAC and pathological tumor reduction; lower MRI response patterns (type 0–2) were related to higher pathological tumor reduction rates than higher MRI response patterns (type 3–5) (r = 0.33; p = 0.003 for reader 1 and r = 0.445; p < 0.001 for reader 2).

Among patients with pCR, nearly all tumors with pCR (n = 16 (20%)) showed a type 0, 1 or 2 response on MRI halfway through NAC (except for one tumor classified as type 4 by reader 2). Considering interobserver agreement, in 40/80 cases (50%) both readers classified the tumor into the same MRI-based response pattern halfway through NAC; in the other half of the cases they disagreed. Interobserver agreement between reader 1 and 2 was therefore considered fair (κ = 0.301).

Halfway through NAC mean tumor sizes determined on MRI by readers 1 and 2 were 19 mm (range 0–54 mm) and 24 mm (range 0–119 mm), respectively. Halfway through NAC tumor there were diameter differences > 5 mm between the two readers’ assessments in 25 patients (31%). Most differences in tumor diameter were in tumors classified as crumbling (58%) and occurred in around 25% (21–33%) of tumors with the other MRI-response patterns.

MRI after NAC - response patterns and interobserver agreement

As shown in Tables 4 and 5, tumors showing a type-0 response (complete response) after NAC had the best pathological tumor response (96% had > 50% tumor reduction and 41% had pCR) followed by both the type 2 (crumbling) and the type 1 response (concentric shrinking tumors), which seem to downsize at similar rates on pathological assessmenbt (Tables 4 and 5).

Table 4 MRI-based response patterns of breast carcinomas on breast MRI after NAC and pathological response per MRI-based response pattern
Table 5 Mean percentages of pathological response per MRI-based response pattern (mean percentage of reader 1 and 2) halfway through and after NAC

When we look further into the type-0 response after NAC, we see that 10 out of 16 tumors classified as having a type-0 response (complete radiological response) by one or both readers on MRI after NAC still showed residual tumor on pathological assessment. None of these 10 tumors were triple negative and 2 were lobular carcinomas.

There was no significant correlation between the MRI-based response patterns after NAC and pathological response after NAC (r = − 0.170; p = 0.145 for reader 1 and r = − 0.169; p = 0.146 for reader 2). Among patients with pCR, we observed that out of 16 patients with pCR, 7 did not undergo MRI after NAC: out of the 9 patients that did, 4 was classified as type 0 by both readers (complete response, true negatives), another was classified as type 0 by reader 1 but as type 3 by reader 2. The other four patients were classified as a type-1 or type-2 response and one was even classified as having type-4 (reader 1) and type-3 (reader 2) responses.

Considering interobserver agreement, in 27/57 cases (47%) both readers classified the tumor into the same MRI-based response pattern halfway through NAC. In the other 53% they disagreed. Interobserver agreement between reader 1 and 2 was therefore also considered fair (κ = 0.312).

The mean residual tumor sizes on MRI according to readers 1 and 2 were 13 mm (range 0–50 mm) and 23 mm (range 0–105 mm), respectively. Mean pathological tumor size after NAC was 21 mm (range 0–105 mm). Tumor diameter after NAC differed by more than 5 mm between the two readers in 22 patients (39%). Tumor diameter differences between readers were least common in complete radiological responders (20%) and were observed in 35%, 40%, 47% and 75% of patients with types 1, 2, 4 and 5 MRI-response patterns, respectively.

Furthermore, tumor diameter differed by more than 5 mm between reader 1 and pathological assessment in 35 tumors (61%, of which size was underestimated in 66%) and between reader 2 and pathological assessment in 33 tumors (58%, of which size was underestimated in 42%). For reader 1, most differences in measurement were in tumors classified as concentric shrinking tumors (70%) followed by the crumbling tumors (64%). For reader 2, most of the differences in measurement were in tumors classified as diffuse enhancing tumors (78%) followed by concentric shrinking (63%) and crumbling (53%) tumors.

Surgery and MRI-based response patterns

Mastectomy was performed in 46 patients and breast conserving therapy in 34 patients. In 7/80 patients the tumor was not completely removed. Characteristics of these seven tumors are displayed in Table 6. There was no significant correlation between MRI-based response patterns halfway through or after NAC and incomplete resection (even though types 1 and 4 seem to predominate). In two of these tumors, the tumor diameter was underestimated by more than 5 mm by both readers, and was underestimated in three tumors by one reader.

Table 6 Characteristics of incompletely removed tumors (tumor-positive margins)

Subtype and MRI-based response patterns

As shown in Table 7, 16% of tumors were classified as ER + HER2+, 60% as ER + HER2-, 8% as ER-HER2+ and 16% as ER-HER2-. Numbers were too small for subtype analyses but we observed that only type 0 or 1 MRI patterns halfway through NAC led to pCR for ER + HER2- tumors while in other subtypes a type 2 (or exceptionally type 4) tumor could become classified as pCR halfway through NAC.

Table 7 Number of tumors per subtype, number of tumors per subtype with pathological complete response and their MRI-based response patterns

Discussion

One of the main reasons for starting neoadjuvant chemotherapy in patients with breast cancer is to decrease tumor volume. Individual responses to NAC vary, depending on molecular subtype, tumor size and treatment regimen, but also depending on factors that are still unknown. Predicting individual response to NAC remains difficult. Enabling response prediction during NAC would help determine the usefulness of NAC and may lead to alterations in treatment regimen or performing surgery earlier than initially planned. To further explore the possibilities of response prediction, the main goal of this study was to investigate the correlation between six MRI-based response patterns halfway through NAC and pathological evidence of tumor response.

Secondary goals were to compare the predictive value of MRI-based response patterns measured halfway through to after NAC and to evaluate interobserver agreement. Furthermore, to achieve the maximum surgical advantage from NAC, it is essential that tumor response and residual tumor are assessed correctly before surgery. In this study, we compared preoperative tumor diameter on MRI with pathological tumor size and evaluated the assessment of residual tumor in different MRI-based response patterns after NAC.

In this study, there was significant correlation between MRI-based response patterns measured halfway through NAC and pathological evidence of tumor reduction (r = 0.33; p = 0.003 for reader 1 and r = 0.45; p < 0.001 for reader 2). Tumors with a type-0 response (complete radiological response) halfway through NAC had a 90% chance of pathological evidence of tumor reduction > 50% and 83% chance of pCR. Tumors with a type-2 response (crumbling tumors) less often had tumor reduction > 50% (65% vs 78%) and less often had pCR (14% vs 26%) than tumors with a type-1 response (concentric shrinking) halfway through NAC. Tumors with a type-4 (stable disease) or type-5 response (progression) halfway through NAC had the lowest chances of tumor regression. A tumor with a type-4 response (stable disease) halfway through NAC only had a 3% chance of being classified as pCR.

Since nearly all tumors with pCR (n = 16 (20%)) had a type 0, 1 or 2 response halfway through NAC (except for one tumor classified as type 4 by reader 2), we can conclude that for a tumor to have pCR, a type 0, 1 or 2 response is required halfway through NAC.

There was no correlation between MRI-based response patterns measured after NAC (prior to surgery) and pathological evidence of tumor reduction (r = − 0.170; p = 0.145 for reader 1 and r = − 0.169; p = 0.146 for reader 2). The greatest pathological tumor reduction was observed in tumors with a type-0 response after NAC (96% had tumor reduction > 50% and 41% had pCR) followed by type 1 (concentric shrinking) (61% had tumor reduction > 50% and had 18% pCR) and type 2 response (crumbling tumors) (69% had tumor reduction > 50% and 8% had pCR). This implies that even if the radiologist no longer detects suspicious lesions (type 0) on MRI after NAC, around 60% of the patients still have invasive carcinoma or DCIS on pathological examination. In conclusion, as we hypothesized, the MRI-based response patterns measured halfway through NAC correlate better with pathological tumor response than MRI-based response patterns measured after NAC.

In our study both radiologists performed an independent reading and we observed fair interobserver agreement halfway through and after NAC, for both the classification of MRI-based response patterns and for tumor diameter measurements. Only half of the tumors were classified as the same MRI-based response pattern, and tumor diameter differences >5 mm occurred in 31–61% of cases, consisting of both underestimation and overestimation of size, and occurring in all MRI-based response patterns. Nevertheless, both readers’ MRI-based response classifications correlated similarly with pathological response. Only in 7/80 patients was the tumor not completely removed (and 2/7 of these patients had mastectomies). The latter might mean that both experienced readers have different strengths leading to an equal outcome. Hence, this might imply that getting better consensus about how to classify the tumors could still improve the results. This study highlights challenges and limitations in predicting response to NAC and determining residual disease in breast cancer.

Comparison with other studies

In a recent study by Ballesio et al. (n = 51) the authors found that 65% of tumors with the concentric pattern halfway through NAC (n = 13; p < 0.001) had pCR, while none of the non-responders had the concentric pattern [14]. This is partly in agreement with our results. We found that 24% of the tumors with the type-1 pattern had pCR and 83% of tumors with the type-0 pattern which assumable together form the concentric pattern of their study. However, the percentage of non-responders with a type 1 pattern was 22% in our study. This difference may be due to the smaller number of patients (n = 51 versus n = 80) and the different choice of the MRI-based response patterns used. They only used three response patterns: concentric, nodular and mixed. This way they assumed all tumors shrink during NAC, while in our study 25% of tumors showed a stable response or progression. They also did not include a complete radiological response (i.e. pattern 0) while this was the group of patients with the strongest correlation with pCR in our study. We did not include a mixed pattern because the difference between “crumbling” and “crumbling and shrinking” might be minute and mostly subjective.

Two other studies studying MRI-based response patterns looked at MRI-based response patterns after NAC. The study of Golden et al. (only triple negative tumors, n = 60) had the same MRI-based response categories except they did not include diffuse enhancement (type 3) [15]. Even though our population only included 13 triple negative tumors, we found similar results in that MRI-based response patterns after NAC cannot successfully predict pathological outcome. Finally, our study was predominantly based on the study of Kim et al. (n = 55 (56 lesions)) [8], expanding and adjusting their classification. Like their results, concentric shrinking and crumbling tumors were more frequently observed in the pathological responder group. They more often noted the diffuse enhancing tumors in the non-responder group. As mentioned earlier, our number of diffuse enhancing tumors was too small to analyze. One of the most important additions in our study, as compared to their classification, was the definition of a complete response group (i.e. pattern 0), since this most accurately predicts pathological response. Furthermore, compared to their study, we tested the classification halfway through NAC in a larger cohort, showing stronger correlation to pathological response than after NAC.

One possible explanation for the better prediction of pathological response by MRI halfway through NAC than after NAC is that taxanes might suppress MRI enhancement irrespective of the cytotoxic activity. This finding was reported by Schrading et al. and since in our patient cohort taxanes were also only given during the second half of treatment, this could be a plausible explanation. Furthermore, as already shown by earlier studies, our study also shows that this might be false negative, especially when lobular carcinomas are classified as complete responders. Furthermore, the two triple negative tumors classified as type 0, were both true negatives.

Strengths, limitations and future implications of this study

We studied the MRI patterns in the largest group of patients so far and only one comparable study was performed recently. Furthermore, this is the first study to look at interobserver agreement in MRI-based response patterns. The other studies that looked at MRI-based response patterns all performed a consensus reading of two radiologists. Therefore, none of them could test interobserver agreement. High interobserver agreement is important and desirable when we want to implement these MRI-based response patters in broad clinical practice. As mentioned earlier, we observed low interobserver agreement for the MRI-based response patterns. To increase interobserver agreement, a group of experienced radiologists should reach consensus about which tumors to classify under which MRI-based response pattern. Furthermore, it would also be desirable to reduce the interobserver differences in tumor diameter measurements. This for example could be done by looking into the strengths and weaknesses of experienced or even dedicated breast radiologists and making current practicing radiologists aware of these strong points and pitfalls.

One of the limitations of this study is that, as in the clinical setting, tumors were only measured on one slice on MRI and in one cutting direction by the pathologist. If the cutting direction was different from the MRI slice direction, the tumor diameter might differ. Another possible factor influencing diameter differences is that residual DCIS was excluded from the size estimation on pathological assessment, whereas this might have been visible on MRI, and therefore might have been included on the MRI evaluations. Furthermore, if multifocal disease was present, only the lesion with the largest dimensions on baseline MRI, considered to be the index tumor, was included for statistical analysis. Therefore, these results might not be applicable to multifocal tumors.

Future research should look into these MRI-based response patterns in an even larger group of patients with breast cancer. This way subgroup analyses can be performed to look at the differences in response patterns in the different subtypes of breast cancer. And last, we have to perform further research combining all response parameters (including subtypes, these MRI-based response patterns identified halfway through NAC and other parameters proven to help in response prediction, like MRI enhancement patterns and (semi-) automated tumor volume assessment) [16], to form a panel of biomarkers that enables individual response prediction.

Conclusion

In patients with breast cancer undergoing NAC, tumor reduction > 50% was seen in about 70% of tumors and reduction in diameter > 50% was seen in about 40% of tumors. MRI-based response patterns halfway through NAC predicted pathological response more accurately than MRI-based response patterns after NAC. A complete radiological response halfway through NAC was associated with 83% pCR while a complete radiological response after NAC only seemed to be correct in 41% of the cases.