Background

Treatment for early-stage breast cancer is focused on minimizing axillary surgery. The IBCSG 23–01 trial [1] demonstrated that patients with micrometastases in sentinel lymph nodes (SLNs) can be spared from axillary lymph node dissection (ALND). Furthermore, ALND does not provide any additional benefit in patients who received breast-conserving surgery (BCS) with 1–2 positive SLNs, as demonstrated in the Z11 trial [2]. Ongoing studies [3,4,5] are attempting to extend the results reported in the Z11 trial to mastectomy patients. The SOUND trial and the recent NCT01821768 trial [6] have been designed to explore the possibility of abandoning SLNB in a select group of patients [7]. However, the safety of the selection criteria used in these studies is unconfirmed. Predictive models for axillary lymph node (ALN) status would help to identify patients who are more likely to have negative ALNs to spare SLNB. These models, presented as nomograms, were reported and validated in different populations [8,9,10,11]. However, none has been widely accepted in clinical practice, possibly due to the lack of external validation in a large population.

In addition, most of the reported models were designed to predict the probability of having any positive ALNs (≥ 1 positive ALNs). It is also important to predict the probability of having N2–3 disease (>/=4 positive ALNs) for clinical decision making. For example, in patients who fit the Z11 criteria and did not receive ALND, successful prediction of the axillary tumor burden may be informative for radiation oncologists in the determination of radiation fields.

The National Cancer Database (NCDB) is a joint program of the American College of Surgeons and the American Cancer Society. The database includes more than 1500 cancer programs in the United States with detailed tumor pathology information and overall survival data. Since 2010, data concerning HER2 status and lymphovascular invasion (LVI) have been available in the NCDB. In this study, we used data from the NCDB to develop novel and accurate nomograms that can predict the probability of having any positive ALNs and N2–3 disease. The wide range of patients represented in the NCDB may help to improve the robustness and generalizability of the novel nomograms.

Methods

Patient selection

We searched the NCDB registry dataset between 2010 and 2013 and identified female breast cancer patients using the following criteria:

Inclusion criteria

  1. 1)

    Year of diagnosis ≥2010 (LVI and HER2 status have been available since 2010)

  2. 2)

    Female gender

  3. 3)

    A known number of lymph nodes was examined, and a known number of positive ALNs was reported

  4. 4)

    The location of the tumor was known (PRIMARY_SITE coding: C501;C502;C503; C504;C505)

Exclusion criteria

  1. 1)

    T-stage unknown, DCIS or T4 patients, or tumor size larger than 10 cm.

  2. 2)

    Phyllodes tumor

  3. 3)

    Presence of metastatic disease at the time of diagnosis

  4. 4)

    Neoadjuvant chemotherapy

  5. 5)

    Patients with a prior tumor diagnosis

  6. 6)

    Patients with radical mastectomy, extended radical mastectomy or unknown surgery type

  7. 7)

    Bilateral breast cancer

  8. 8)

    Patients with overlapping lesions of the breast, multicentric lesions, or lesions that involved the entire breast (PRIMARY_SITE coding: C508;C509)

  9. 9)

    Tumor grade unknown, except for lobular carcinoma

  10. 10)

    ER, PR, and HER2 status unknown; HER2 borderline patients were also excluded

  11. 11)

    Unknown LVI status.

This was a retrospective study using anonymous and de-identified data from the NCDB. The authors cannot assess the information that could identify individual participants; therefore, this study was exempt from the Johns Hopkins Medicine Institutional Review Board and the Sun Yat-sen Memorial Hospital ethical committee review, and no consent was required.

Statistical analysis

Patients diagnosed from 2010 to 2011 and from 2012 to 2013 with ≥1 nodes examined were defined as the training cohort and validation cohort, respectively, for predictive model development and validation.

We used the Chi-square test to identify risk factors for positive ALNs. The statistically significant (P < 0.001) risk factors were considered to be potential predictors of ALNs status and were all included in the full model. We used a binary logistic regression model to develop a predictive model for ALN status. We used Akaike information criterion (AIC) and ROC analysis to identify the optimal model. We used the full population to develop a prediction model (Model-A) of the risk of having any ALNs(+). Next, we developed a model (Model-B) that could estimate the conditional probability of having pN2–3, given the conditions that the patients had ALNs(+), that patients were ALN-positive, and that patients with <10 ALNs examined and > = 1 positive ALNs (N = 23,106) were excluded.

We used the “rms” package of the R software to develop nomograms to visualize our predictive model graphically. Nomogram-A estimated the probability of having any positive ALNs (P_any). Nomogram-B estimated the conditional probability of having pN2–3 disease (P_con). The probability of having pN2–3 disease can be calculated as P_any*P_con.

We used the ROC analysis and calibration plots to evaluate the discriminative ability and accuracy of the models, respectively. The performance of the models were evaluated and validated internally in the training cohort and externally in the validation cohort, respectively.

For sensitivity analysis, we randomly selected 500, 5000 and 50,000 patients from the study population and calculated the AUC values of the model in these sub-populations. We repeated the sampling for N = 200 times and calculated the mean and standard deviations of the AUC values to determine the stability of AUC values.

All of the statistical analyses were performed using STATA 13.0MP and R.

Results

Clinicopathological features

This study included 201,452 breast cancer patients cataloged in the NCDB with a median age of 61 years old. The clinicopathological features are listed in Table 1. There were 99,618 and 101,834 patients in the training and the validation cohort, respectively. Patient features were similar between the training cohort and the standard validation cohort.

Table 1 Clincopathological features of the study populations

Nomogram for predicting risk of any positive ALNs

We used Chi-square analysis and logistic regression as univariate and multivariate analysis to evaluate the risk factors for any positive ALNs in the training cohort. Age, location of lesions, T-stage, histology, ER, PR, HER2, tumor grade and LVI were independent predictors for any positive ALNs by univariate analysis (Table 2). These variables were further confirmed as independent factors in the multivariate analysis, and variables were incorporated in the full model. We also tested some variant models with different variables included. The full model had similar AIC and C-index with the variant model 2 (Additional file 1 : Table S1) and the latter consisted of fewer variables. Therefore, we selected variant model 2 (with age, quadrant, size, histology, grade and LVI as predictors) for development of nomogram A to predict the risk of any positive ALNs (Fig. 1).

Table 2 Analysis of risk factors for any positive ALNs
Fig. 1
figure 1

a Nomogram to predict the probability of having any positive ALNs (P_any); b Nomogram to predict the conditional probability of having N2–3 disease (P_con), when the patients have any positive ALNs. The absolute probability of having N2–3 can be estimated by P_any*P_con

Nomogram for predicting pN2–3 disease in patients with any positive ALNs

We excluded patients with negative ALNs to predict the pN2–3 disease in patients with any positive ALNs. Patients had <10 ALNs nodes examined, and ≥1 positive ALNs were also excluded (N = 23,106). Univariate analysis suggested that age, location of lesions, T-stage, histology, ER, PR, HER2, tumor grade and LVI were risk factors for pN2–3 disease in patients with any positive ALNs (Table 3). These variables, except for ER and PR status, were confirmed as independent risk factors in the multivariate analysis. The full model was selected based on its lowest AIC and the highest C-index (Additional file 1 : Table S1). Nomogram-B (Fig. 1) was developed to predict the conditional probability of having pN2–3 patients, given that patients have ≥1 positive ALNs.

Table 3 Analysis of risk factors for pN2–3a

Distribution of the predicted probability

The training cohort and the validation cohort exhibited a similar distribution of predicted risks by the new model (Additional file 2 : Fig. S1). Most of the predicted risk of any ALNs ranged between 0 and 20%. Most of the predicted risks of pN2–3 disease ranged between 10% and 50% and between 0% and 10% in patients with any positive ALNs and in all populations, respectively.

Validation of the nomograms

The AUC values of the nomograms (Additional file 1 : Table S1) for predicting any positive ALNs and pN2–3 disease were 0.788 and 0.680 in the training cohort and 0.786 and 0.677 in the validation cohort, respectively. The calibration plot (Fig. 2) suggested that the nomograms were well-calibrated. The average estimation errors of predicting any positive ALNs and pN2–3 disease were 0.78% and 0.85% in the training cohort and 1.14% and 2.79% in the validation cohort, respectively.

Fig. 2
figure 2

Calibration plots of nomogram-A to predict the probability of having any positive ALNs in the a) training and b) validation cohort, and nomogram-B to predict the conditional probability of having N2–3 disease in the c) training and d) validation cohort

Sensitivity analysis

For sensitivity analysis, we randomly selected 500, 5000 and 50,000 patients from the training and validation cohorts and performed the ROC analysis and calibration plot analysis. We repeated the re-sampling 200 times to obtain a reliable estimation of the AUC values and average prediction error between the actual and predicted risks. As shown in Additional file 3 : Table S2, the estimated AUC values and average prediction error were similar among sub-populations with varied sample sizes.

Discussion

Accuracy of the nomograms

The first predictive model for ALN status was developed a decade ago by Bevilacqua et al. [8]. The authors retrospectively reviewed the database of MSKCC and identified 3786 and 1545 breast cancer patients as training and validation sets, respectively. A nomogram was developed using age, tumor size, special pathology type, location, LVI, multifocal status, nuclear grade, and ER and PR status as predictors of ALN status. Chen et al. [10] validated the MSKCC model in a Chinese population (n = 1545) and reported a new nomogram (the Shanghai model) using data from Chinese breast cancer patients. However, the MSKCC model did not incorporate HER2 status. Reyal et al. [11] reported that molecular subtype approximation, including ER, PR and HER2, is also a determinant of ALN status, and another nomogram was later developed (the Paris model). Additionally, several more models [9, 12,13,14,15] have been developed to predict ALN status. However, none of these models has been widely accepted by treatment guidelines, and clinical practice has not significantly changed. A lack of sufficient evidence to support external validity is one of the major underlying reasons. In addition, these models can only predict the risk of having ≥1 positive ALN. In the current study, we used a large multi-institutional NCDB population to develop and validate a set of nomograms that can predict the risk of having any positive ALNs and N2–3 disease.

Benefit of the new nomograms in the post-Z0011 era

The Z11 study [2] demonstrated that patients with 1–2 positive SLNs receiving BCS and standardized adjuvant therapies could be spared from ALND [16]. However, it is impossible to know whether a patient fits the Z11 criteria or not before surgery, as the number of positive SLNs can only be identified during or after surgery. Our nomograms may be able to identify patients who may not fit the Z11 criteria by predicting the risk of having N2–3 disease preoperatively. If a patient had a high risk of having N2–3 disease, she may be unlikely to fit the Z11 criteria.

Because mastectomy patients were not included in the Z11 study, ALND is still a routine procedure for SLN-positive patients. However, several retrospective studies suggested the feasibility to omit ALND in selected mastectomy patients with positive [4, 17, 18]. A prospective randomized trial was also initiated to test this hypothesis (NCT02112682). Therefore, the trend that Z11 conclusions could be extended to mastectomy patients is very clear, and with the help of these nomograms, surgeons may feel safer in omitting the ALND in selected mastectomy patients with positive SLNs.

One concern related to omitting the ALND in mastectomy patients is whether RT should be given. The NCCN guidelines [19] clearly recommend RT to the infraclavicular region, supraclavicular area, and internal mammary nodes for patients with N2–3 disease (≥4 positive ALNs). For patients with 1–3 positive nodes (N1 disease) after mastectomy, radiotherapy coverage of these areas was considered controversial by NCCN panel members [19] because high-level contradictory evidence was apparent [20,21,22,23]. With our nomograms, omitting ALND in selected mastectomy patients after positive SLNs may not be a major problem, as the radiation oncologist can estimate the risk of having N2–3 disease and determine the treatment plans. Additionally, these nomograms would be more helpful to the radiation oncologists, in that 1) they may help reassure them that patients who met Z11 criteria do not need additional radiation therapy, in terms of increasing the tangents/fields for radiation; 2) For patients who received neoadjuvant chemotherapy, RT decisions should be based on pre-chemotherapy tumor features regardless of the tumor response [18]. Our nomograms may be useful in the estimation of axillary tumor burden prior to the initiation of neoadjuvant chemotherapy to provide more information.

Benefit of the new nomograms in the “SLNB-sparing” era

In the post-Z11 era when “the days are numbered for axillary surgery” [24], it is likely that SLNB could be omitted in selected patients. The SOUND trial [7], proposed in 2012, was designed to test this hypothesis. In the SOUND trial, T1 breast cancer patients with clinically negative axilla were randomized into groups receiving either observation or SLNB. There were only 12.8% of patients with positive ALNs in the SLNB group, suggesting the high probability that SLNB could be spared in the future. The development of the nomograms is consistent with the trend towards the “SLNB-sparing” era. If we could identify node negative patients preoperatively, the omission of SLNB would be much safer, without the need to wait for the results of the SOUND trial. Sparing SLNB would improve the quality of life and reduce the medical cost and all possible surgical complications. The authors recently reported that the physical function of the upper limb in the no-SLNB group was significantly better than in the SLNB-group, suggesting the benefit of minimizing axillary surgery when appropriate [25].

In the “SLNB-sparing” era, if the SOUND trial demonstrated that not performing SLNB in selected patients is safe, there will be concerns regarding the absence of axillary staging on the decision to use adjuvant therapies. For example, post-mastectomy radiotherapy (PMRT) is not necessary in T1–2 patients with negative ALNs, whereas in patients with positive ALNs, PMRT is strongly recommended by the NCCN guidelines [19]. The need for radiotherapy has also influenced the optimal timing of breast reconstruction (e.g., immediate vs. delayed). Additionally, a T1a patient with HER2 positive disease may be spared from chemotherapy if ALNs were negative, and adjuvant chemotherapy is recommended for patients with positive ALNs [19]. Taken together, the ability to predict the probability of having any positive ALNs (or N0 disease) would be helpful in the “SLNB-sparing” era in the future.

Limitations

There are several limitations to this study.

First, several of the predictors of these models, such as LVI and multifocal lesions, may not always be available prior to surgery. Core needle biopsy may not provide an adequate volume of tissue for the identification of LVI. These issues may limit the utility of the models developed in the current study. However, ultrasound-guided vacuum-assisted biopsy [26, 27] has been used by many institutions and may provide a larger volume of tissue for the identification of LVI. Current imaging modalities can provide an accurate estimate of tumor size and multifocality [28,29,30,31].

Second, several important variables are not available in the NCDB, such as whether the tumor was palpable or ki-67 status. More importantly, the clinical axillary status was also not available in this study. In patients with clinically negative axilla, the probability of having N2–3 disease is very low. The performance of the model in these patients needs to be validated.

Third, the NSABP B-32 trial suggested a 10% false-negative rate of SLNs. In our study, patients with SLNB only, without any positive SLNs, were classified as having no positive ALNs. This limitation cannot be avoided when using data from the modern era when SLNB is the routine practice. However, we believe that the 10% false-negative rate may not significantly affect the performance of our model.

Fourth, when developing nomogram-B for predicting the conditional probability of having N2–3 disease, we excluded 23,106 patients (11.5%, 23,106/201,452) with ≥1 positive ALNs but with less than 10 ALNs evaluated. We may have skewed the data by excluding these patients. However, we considered that the benefit of excluding these patients might outweigh the harm of including them. As demonstrated in the Z11 study, patients with 1–2 SLN+ without further ALND may theoretically have a 27% risk of additional positive ALNs. Therefore, the exact amount of positive ALNs in these patients was unknown, leading to the inaccuracy of model development and validation.

Fifth, these nomograms can only be used in patients with a single focus of disease and only in patients with unilateral disease.

Conclusions

In this study, we used a large multi-institutional NCDB population to develop a set of nomograms to predict nodal status in breast cancer patients. Future validation studies are needed to confirm our findings.