Introduction

Neoadjuvant chemotherapy (NACT) is the recommended treatment option for breast cancer (BC) patients with axillary lymph node (ALN)-positive disease [1]. The accomplishment of pathological complete response (pCR) following NACT, preferably including response in the breast and the axilla, is associated with improved prognosis [2, 3]. To avoid surgical over-treatment of the axilla, that is abstaining from axillary lymph node dissection (ALND) for patients that subsequently are shown to have axillary-pCR, high-performing imaging and/or less invasive surgical staging procedures are needed [4,5,6,7]. The SENTINA trial [4] and the Z1071 trial [5] are both prospective multicenter studies. The SENTINA trial was designed to evaluate the timing of sentinel lymph node biopsy (SLNB) in the NACT setting and the objective of the Z1071 trial was to determine the false-negative rate (FNR) for sentinel node (SLN) surgery following chemotherapy in women initially presenting with biopsy-proven cN1 BC. In both studies, the primary endpoint was FNR of SLNB after NACT in patients presenting with upfront cN1 disease. The SENTINA trial and the Z1071 trial showed a FNR of 14% and 13%, respectively, for SLNB performed post-NACT, thus higher than the predefined threshold of 10%. The SLN FNR was not different based on axillary ultrasound (AUS) results; however, using a strategy where only patients with normal AUS undergo SLN surgery reduced the FNR in patients with ≥ two SLNs removed included in the Z1071 trial from 12.6 to 9.8% when preoperative AUS results are considered as part of SLN surgery [8].

AUS is often the first-hand choice for axillary imaging, while more advanced methods for instance magnetic resonance imaging (MRI) and 18F-fluorodeoxyglucose positron emission tomography/computed tomography (FDG-PET/CT) are seldom routinely used [9]. At the time point of BC diagnosis, abnormal baseline AUS is routinely followed by ultrasound-guided fine needle aspiration cytology (FNAC); a quick minimally invasive method of axillary staging. FNAC-verified ALNM obviates SLNB, allowing the patient to proceed directly to ALND or, as for the patients in the present study, to NACT followed by ALND [10, 11]. However, current guidelines recommend that SLNB can safely be performed post-NACT for upfront c/pN + patients and used as a discriminator for ALND [12, 13].

From an axillary surgery perspective, correct prediction of axillary-pCR is of utmost interest to enable abstaining from ALND. Identification of predictive imaging biomarkers of axillary-pCR is therefore important. In addition to breast tumor and ALN characteristics, mammographic density (MD) and its association with axillary-pCR, is investigated in this study. The timing of SLNB for BC patients treated with NACT is debated, and current guidelines [11, 12] recommend SLNB performed post-NACT and, in case of benign findings, omission of ALND [4, 5].

In this study, we report results from a well-characterized prospective cohort with AUS performed pre- and post-NACT and detailed pathology data on ALN-metastases (ALNM) from both baseline (pre-NACT) and post-NACT (breast surgery and ALND). In addition, prospectively assessed mammographic density (qualitatively and quantitatively) at the corresponding time points were retrieved. While a large number of studies have investigated the performance of AUS pre-NACT [14, 15], only a few studies have investigated the performance of AUS post-NACT [6, 8, 16,17,18,19] and not all report test performance data [8, 17]. We investigated the test performing measures in terms of correctly identifying ALNM of AUS pre-NACT and, most importantly, post-NACT. Since overweight and a lobular BC subtype could be associated with inferior accuracy of AUS [20,21,22], stratification according to these parameters were performed. We also aimed to investigate the association between AUS parameters, as well as patient and tumor characteristics, respectively, as predictors of axillary-pCR.

Methods

The NeoDense-study cohort, a part of the SCAN-B study [23] (Clinical Trials ID NCT02306096), is a prospective cohort of BC patients receiving NACT during 2014–2019 at two sites within Skåne University Hospital, Sweden as previously described [24]. At diagnosis, patients eligible for NACT were included in the study following written consent (N = 207), of whom five patients were excluded due to ineligibility (Fig. 1) [25]. All patients with c/pN + pre-NACT were subject to ALND according to clinical routine and the Swedish National Guidelines at the time of study inclusion (Supplementary Material 1).

Fig. 1
figure 1

Flowchart

Pre-NACT cohort

For the assessment of AUS at the pre-NACT time point, the whole NeoDense-cohort was used (N = 202), in order to include both baseline AUS abnormal and normal assessments. SLNB was performed pre-NACT for patients with clinically and AUS node-negative disease (cN0) (N = 87). Hence, the pathological diagnosis of ALN at baseline were based on assessment of SLNs in 87 patients and FNAC in 115 in patients with cytology verified ALNM (Fig. 1). SLNB was performed prior to study inclusion in 78 patients, and for these patients, the axillary evaluation of the preceding diagnostic AUS was used in the statistical analyses of the pre-NACT cohort.

Post-NACT cohort

From the 202 patients enrolled in the NeoDense study, we report on 114 (56%) who had a cytology verified ALNM at baseline and had an ALND performed post-NACT. At the post-NACT time point, patients having pre-NACT SLNB-performed (N = 87) were excluded and N = 1 patient was excluded due to being part of the SenoMac-study (Clinical Trials ID NCT02240472) and thus no ALND was performed.

The term “axillary-pCR” commonly used in the literature, meaning no remaining invasive cancer in the axilla following NACT, is in this study only used for patients with FNAC-verified metastases at baseline; the term axillary-pCR was only applicable to patients with no previous surgical removal of ALN (due to SLNB).

Clinical data

Referring to the pre-NACT cohort (N = 202): a total of N = 196 (97%) patients received standard chemotherapy regimen [3 × fluorouracil, epirubicin and cyclophosphamide (FEC), or epirubicin and cyclophosphamide (EC) + 3 × docetaxel (or equivalent series of paclitaxel)], or in the reversed order. In the case of human epidermal growth factor receptor 2 (HER2) overexpression BC (N = 49), HER2-blockade was added [N = 46, whereof 94% received double HER2-blockade (trastuzumab and pertuzumab), and the remaining three patients received single trastuzumab]. Data on clinical and pathological parameters were gathered from study-specific forms, medical charts, and clinical pathology reports.

Pathology evaluation

FNAC of suspicious ALN was performed pre-NACT according to clinical routine by breast radiologists prior to study inclusion. Standard procedure included aspiration with 22 Gauge needle (0.7 mm × 50.0 mm). All pathological interpretation was performed according to clinical routine at the pathology department by board certified cytopathologists. pCR was defined as no residual invasive cancer foci in the breast and axilla (ypT0/is ypN0), in accordance with current guidelines [26].

Imaging

Each patient had imaging examinations performed of the breast (mammography and ultrasound and parts of the cohort also breast tomosynthesis) and the axilla (AUS) at three time points: pre-NACT, after two series of NACT (during NACT), and post-NACT; the timing was mirroring clinical routine, and a detailed timeline of the cohort is already published [24]. Ultrasound assessment of ALNs was performed by experienced breast radiologists (specially trained and working at a breast imaging center N = 13) and were considered abnormal or normal by evaluating the following criteria: nodal size, cortical thickening, hilar effacement, echogenicity, and shape [27]. No study-specific criteria for abnormal ALN was used; the assessment of normal/abnormal was at the discretion of the evaluating radiologist. Size, shape, cortex thickness above 3–4 mm, hilar effacement, and echogenicity were in a combined overall assessment used in the clinic by the radiologist to discriminate between normal/abnormal nodes. At the time of study inclusion, Breast Imaging-Reporting and Data System (BI-RADS) for AUS categorization was not used. The radiologists prospectively filled in study-specific forms at the time of AUS examinations including number of abnormal ALNs and their size(s) [28, 29] and echogenicity. For the post-NACT cohort, the number of valid long and short-axis measurement, respectively, for the abnormal AUS only at the different time points are presented in Table 2. The details of the ultrasound machines used are presented in Supplementary Material 2. Axillary radiological complete response (rCR) was defined as no abnormal findings (i.e., findings indicating malignancy) by AUS. Mammographic density was assessed both qualitatively by radiologists according to BI-RADS [30], and quantitatively with the automated software Volpara (version 1.5.4.0, Volpara Solutions Limited, Wellington, New Zealand) [31]. The breast tumor was marked with a radiopaque clip prior to NACT according to clinical routine, while no marking was performed of abnormal ALNs.

Statistics

We summarized cohort baseline characteristics, including pathology results from the breast and axilla. We calculated descriptive statistics according to axillary-pCR for ultrasound features of the breast tumor and ALN at three time points (baseline, during NACT, and post-NACT).

Test performance: For pre- and post-NACT cohort, we used axillary node-stage by AUS as a test for axillary node-stage by pathology/cytology, both at baseline (N = 202) and post-NACT (patients having FNAC-verified ALNM at baseline as well as ALND performed, N = 114). We estimated test performance measures; sensitivity, specificity, positive, and negative predictive value (PPV and NPV) with 95% confidence interval (CI) for the AUS-pathology association at baseline and post-NACT. Subgroup analyses were performed according to body constitution and histopathological subtype.

Prediction models of axillary-pCR: post-NACT cohort

We furthermore used simple logistic regression to assess whether baseline patient [age, body mass index (BMI), menopausal status], tumor characteristics [estrogen receptor (ER), HER2, Ki67], histopathological subtype, and imaging characteristics (tumor response, MD, and the number of abnormal ALN by AUS) were associated with axillary-pCR (note that absence of ALNM is considered as outcome). In these models, axillary-pCR was the dependent variable, whereas the individual characteristic was included as an independent variable. To establish the independent association of these characteristics and axillary-pCR, we also conducted a fully adjusted multivariable logistic model. The independent variables in this axillary-pCR-model were deduced from simple and multivariable logistic regression models of different AUS parameters of abnormal ALN (number, long-axis, long/short-axis ratio), ultrasound breast tumor parameters (size and response), and MD at three different time points and their association with axillary-pCR. Each multivariable model included the baseline covariates from the previous model: model 1, age; model 2, model 1 + BMI and menopausal status; and, model 3, model 2 + ER-status + HER2-status + Ki67 (all from core biopsies of the breast tumor at baseline).

Statistical software

For the test performance measures calculations, MedCalc Statistical Software version 19.2.6 (MedCalc Software Ltd, Ostend, Belgium; https://www.medcalc.org; 2020) was used. Otherwise, IBM SPSS Statistics for Windows, version 26 (IBM Corp., Armonk, N.Y., USA) was used.

Results

Descriptive results

The baseline characteristics of the pre- and post-NACT cohort are displayed in Table 1. Invasive ductal carcinoma was the most common histopathological subtype in both the pre- and post-NACT cohort (165 of 202 (82%) and 97 of 114 (85%), respectively), followed by invasive lobular carcinoma (16 of 202 (8%) and 9 of 114 (8%), respectively). Regarding MD, most patients had intermediate MD (BI-RADS b or c combined accounted for 165 of 202 (82%) and 91 of 114 (80%) of the pre- and post-NACT cohort, respectively). The axillary-pCR rate was 30% (34 of 114).

Table 1 Pre- and post-NACT cohort: patient, tumor, and axillary characteristics pre-NACT, and pathological/radiological ALN-status post-NACT

The number and proportion of abnormal ALNs by AUS decreased during NACT in both the axillary-pCR and non-axillary-pCR group (Table 2). Post-NACT, the proportion of normal ALN-status by AUS was 77% (26 of 34) in the axillary-pCR group and 61% (49 of 80) in non-axillary-pCR group. Of the 78 patients with SLNB prior to inclusion, 83% (N = 65) had no abnormal findings by AUS.

Table 2 Post-NACT cohort: ultrasound features of breast tumor and ALN, at baseline, during NACT, and post-NACT according to axillary-pCR

Test performance

Test performance measures of AUS pre- and post-NACT were stratified according to BMI and histological subtype, are presented in Fig. 2. Pre-NACT, a total of 123 of 202 (61%) met abnormal AUS criteria (according to the expertise judgment by the radiologist), the corresponding number post-NACT was 38 of 114 (33%). AUS showed better performance in terms of identifying ALNM pre-NACT (N = 202) as reflected by the PPV of 0.94 (95% CI 0.89–0.97) and sensitivity of 0.81 (95% CI 0.74–0.87). The performance of AUS was inferior post-NACT (N = 114); PPV 0.76 (95% CI 0.62–0.87) and sensitivity 0.35 (95% CI 0.24–0.47). Stratified analyses according to BMI and histological subtype pre- and post-NACT showed no differences (Fig. 2).

Fig. 2
figure 2

Test performance measures of AUS pre- and post-NACT. Stratification according to body constitution and histopathological subtype

Prediction models of axillary-pCR: post-NACT cohort

Baseline characteristics positively associated with accomplishing axillary-pCR in the simple and multivariable logistic regression analysis (N = 114) were: premenopausal status (OR 0.08 95%CI 0.01–0.82), ER-negativity (OR 9.05 95%CI 2.09–39.14), HER2-overexpression (OR 6.18 95%CI 1.62–23.56), and mammographic dense breasts (OR 6.98 95%CI 1.54–31.62) (Table 3). Tumor response as assessed with ultrasound (decrease ≥ 30% in largest diameter) between baseline and “during NACT” showed association with axillary-pCR in the unadjusted model (OR 2.60 95%CI 1.11–6.07); however, this association was not retained in the multivariable model (OR 1.48 95%CI 0.43–5.08). The fully adjusted multivariable model including the 114 patients (adjusting for age, BMI, menopausal status, ER, HER2, and Ki67) is displayed in Supplementary Material 3, showing that the odds ratio for accomplishing axillary-pCR increased with the decreasing number of abnormal ALNs on AUS during (OR 0.46 95%CI 0.25–0.83) and post-NACT (OR 0.58 95%CI 0.30–1.10) (Supplementary Material 3).

Table 3 Post-NACT cohort: simple and multivariable logistic regression analysis of baseline tumor, patients characteristics, and imaging characteristics during/post-NACT as predictors of axillary-pCR following NACT

Discussion

For BC patients receiving NACT, reliable imaging is needed both at baseline, in the initial staging-situation for well-grounded systemic treatment decisions, as well as post-NACT to optimize surgical treatment decisions. It is important to evaluate the performance of AUS, as has been studied multiple times before at the initial staging (pre-NACT or pre-primary BC surgery) [14, 15], but reported in a few previous studies post-NACT [6, 16, 18, 19]. We present results of a well-characterized prospective cohort with extensive pathology data (complete data cytology proven ALNM at baseline and ALND post-NACT) and a detailed study protocol with sequential imaging (pre, during, and post-NACT). Adding information to previously published studies, we present clinically valuable performance measures of AUS post-NACT. In addition to the many studies presenting nomograms (predominantly baseline data) for prediction of axillary-pCR [17, 32,33,34,35,36,37,38], this study presents novel findings of the association between MD and axillary-pCR.

Test performance

AUS pre-NACT

Our results show that baseline AUS could, to a large extent, correctly identify ALNM; the sensitivity, specificity, and PPV were satisfactory. However, a NPV of 0.65 (95% CI 0.57–0.73) shows that pre-NACT AUS has limitations to correctly identify metastasis of any size in the axilla. This finding supports current guidelines that patients with clinically and AUS normal axilla at baseline can be staged by SLNB post-NACT without missing important information. The literature shows diverse sensitivity and specificity for the diagnosis of ALNM at baseline with AUS, ranging from 49 to 87% and 53 to 97%, respectively [14, 15]. This variety might partly be explained by the lacking consensus for imaging characteristics or scoring systems for abnormal ALN by AUS [15] and that ultrasound is a modality that has high intra- and inter-observer variability [39].

AUS post-NACT

In more recent years, classification systems of ultrasound evaluation of ALN post-NACT have been presented to determine important ALN characteristics to consider post-NACT [40]. Importantly, in our study, the sensitivity of AUS post-NACT (identifying ALNM) was considerably lower in comparison to pre-NACT, but the specificity and PPV were acceptable. Thus, AUS could not identify all (subsequent pathology-verified) ALNM. Previous studies have shown sensitivity and specificity rates for AUS (of identification of ALNM) post-NACT of 50–60% and 60–77%, respectively [6, 18]. In the SN FNAC study [19], the PPV and NPV of AUS post-NACT were slightly higher than in the present study (PPV 81% and NPV 48% in SN FNAC-study in comparison to PPV 76% and NPV 38% in our study). These results are to be expected since ultrasound is reflecting a macroscopic feature in contrast to the remaining microscopic findings in the pathology specimen. The study samples are similar (N ranging from 139–157 [6, 18, 19]) to ours except for the large cohort in the SENTINA trial [16].

Other modalities

Several strategies in improving the diagnostic performance of axillary staging by imaging have been proposed. In our study, we used a limited number of easily accessible, clinically established AUS parameters [41]. Also, other imaging modalities for axillary staging pre- and post-NACT must be mentioned. In the primary staging-setting, a review by Marino et al. showed pooled estimates of the sensitivity of 75–80% and 59–69% for MRI and FDG-PET/CT, respectively, and the corresponding estimates for specificity were 89–91% and 90–95% for MRI and FDG-PET/CT, respectively [15]. When examining the axilla post-NACT, a study using MRI (N = 65) has shown a PPV of 67% and a NPV of 66% of biopsy-proven ALNM pre-NACT in terms of predicting axillary-pCR [42]; the corresponding numbers for AUS in our study was PPV 38% and NPV 76%. A comparative study between AUS, MRI, and FDG-PET/CT presented post-NACT sensitivity of axillary imaging in detecting ALNM to be 70% for AUS (N = 106), 61% for MRI (N = 88), and 63% for FDG-PET/CT (N = 32) [43]. Another study comparing different modalities’ [ultrasound (N = 135), MRI (N = 136), and FDG-PET/CT (N = 99)], and combinations thereof, test performance measures post-NACT, showed NPV ranging from 28 to 48%, the latter from the combination of AUS and MRI [6]. In conclusion, studies of these advanced imaging modalities post-NACT have a limited number of study participants and a comprehensive overview of axillary imaging post-NACT, including AUS, is thus warranted.

AUS, BMI, and histopathological subtype

Since clinical axillary palpation might be more challenging in overweight/obese patients, AUS is of even greater importance for these patients. In overweight and obese patients AUS could be afflicted with inferior performance due to technical challenges and obesity-related ALN-alterations [44, 45]; however, studies of baseline AUS points toward no impediment [45]. To the best of our knowledge, not previously reported in the literature, we present results of test performance of AUS post-NACT in relation to BMI. In adherence to studies of AUS at baseline, we found no difference in AUS performance according to BMI at either time point. Previous studies [20, 21] have indicated inferior accuracy of AUS in lobular cancer. In the present study, we found inconclusive results in terms of AUS performance at each time points due to insufficient number of patients in the lobular histopathology groups (N = 14 pre-NACT and N = 9 post-NACT).

Prediction models of axillary-pCR: post-NACT cohort

An important finding in our study is the results from our simple logistic regression analysis of patient and tumor characteristics: the associations between ER-negativity and HER2-overexpression, respectively, and axillary-pCR were more pronounced than for many of the imaging characteristics of the breast and the axilla. Our results are in line with previous studies presenting predictive models, recognizing the importance of pre-NACT tumor characteristics [17, 32,33,34,35,36,37,38, 46]. These studies have similar odds ratio of axillary-pCR as in the present study, thus adding credibility to our results. Similar to the tumor in the breast, ALN response to NACT is dependent on BC tumor subtype [47].

Mammographic density

Mammographic density, reflecting the radiodense stroma and epithelium of the breast on a mammogram [48], is associated with increased risk of BC development [49], higher risk of recurrence [50], and possibly poorer response to treatment [51, 52], although inconsistent results have been presented [24]. BC tumors in mammographic dense breasts are often larger at diagnosis and have positive ALN [53], thus justifying exploring the association between MD and rate of axillary-pCR. We did not find any association between MD assessed with Volpara and the likelihood of accomplishing axillary-PCR. In contrast, the BI-RADS assessment showed that dense breasts (BI-RADS c/d) were associated with higher odds ratio of accomplishing axillary-pCR in comparison to non-dense (BI-RADS a/b), an association more pronounced at the later time points. To the best of the authors’ knowledge, no previous studies have addressed MD vs. axillary-pCR.

Future perspective

The timing of SLNB is under scientific and clinical debate [4]. In 40–65% of patients with positive SLN at baseline, the SLN(s) is expected to be the only positive ALN, meaning that many of these patients do not have ALNM left in the axilla during and post-NACT [54, 55]. SLNB offers a reliable staging at baseline [56]; in patients with a benign SLNB pre-NACT, it is considered safe to omit further axillary surgery, conditionally not progressing during NACT [57]. Patients in our cohort were treated according to this clinical algorithm. Correspondingly, there is an ongoing discussion of alternative treatment strategies to ALND for upfront ALN-positive patients with axillary-rCR post-NACT. To reduce the morbidity related to ALND [58], less invasive procedures and treatment strategies are warranted. High demands are put on imaging, of which ultrasound is considered to be the preferred choice for axillary assessment [9], and used as a discriminator for patients eligible for SLNB [9]. Likewise, performed either pre-NACT, or as current guidelines recommend, post-NACT [12, 13]; the test performance measures of SNLB must be high-level, most importantly with low FNR [59]. Our results indicate that AUS is a good predictor of ALNM at baseline, supporting abstaining from SLNB before NACT. However, AUS assessment post-NACT was not able to correctly diagnose remaining tumor deposits in patients with ALNM at the time of diagnosis.

Strengths and limitations

Our study has many strengths, including the prospective design with detailed data on patient, tumor, and breast characteristics at several time points. Since the guidelines are currently changing, recommending SLNB post-NACT, a similar study to our study and the SENTINA study [4] will be difficult to perform in the future. Possibly, access to breast MRI, could have been beneficial; however, axillary-MRI (provided a dedicated axillary protocol) has a minor role in axillary assessment [15]. Applicable to many imaging studies of the axilla, the lack of a standardized system of reporting findings (e.g., BI-RADS [30]) contributes to a wide variety of AUS test performance measures. To the best of the authors´ knowledge, no guidelines exist on a national or European level regarding assessment of abnormal/normal ALN. Another shortcoming is the incomplete data on ALN short-axis measures (used in the RECIST criteria [60]). Consequently, only a limited number of patients had data on long/short axis ratio, an established measure mirroring the shape of the ALN (round or elongated) [41]. At baseline, according to clinical routine, FNAC was used to verify abnormal ALN by AUS. However, core biopsy to the ALN is considered to have higher diagnostic accuracy and is currently introduced [61]. Selection bias should be briefly addressed; although many patients had a positive ALN at baseline due to NACT being a preferred treatment option for patients with cytology/pathology-verified ALN at baseline, an abnormal ALN-status by AUS was not an inclusion criterion, and the selection bias should thus be minor.

Conclusion

Prior to NACT, AUS could, to a large extent, correctly identify abnormal ALN, supporting the omission of SLNB pre-NACT. In contrast, AUS alone is not sufficient to determine remaining ALNM post-NACT, whereas tumor biomarkers at baseline are predictive of axillary-pCR. We found no difference in AUS performance according to BMI at any time point. Larger multi-center studies are needed to evaluate the performance of AUS post-NACT. Investigation of other imaging modalities for treatment evaluation post-NACT is encouraged.