Synopsis

A retrospective cohort of 12.572 early BC patients with SN biopsies was randomly divided into two separate patient sets to develop, validate and compare different predictive nomograms for the risk of developing LN metastases from clinical and pathologic variables provided by tumor surgical results or by biopsy.

Background

In breast cancer (BC), nodal status is a major prognostic factor that determines therapeutic decisions to a large extent. Sentinel lymph node biopsy (SLNB) provides a reliable assessment of the axilla status in early clinically node-negative BC [1]. Since it also causes less morbidity than axillary lymph node dissection (ALND), it is now considered as a standard of care procedure. The omission of completion ALND in patients with negative sentinel lymph nodes (SLN) has been recognized as a reasonable attitude since the publication of the NSABP B-32 results [2]. Moreover, it is likely that it can be safely expanded to patients with minimal SLN involvement (isolated tumor cells and micro metastases), with regard to survival outcomes [3, 4]. Indeed, 40 to 70% of these patients do not have metastatic non-sentinel lymph nodes (NSLN) [5]. Main predictors of LN metastases are tumor size, grade, lymphovascular invasion (LVI), age at diagnosis, extracapsular extension of the positive SLN, and hormonal and HER2 receptor status [6,7,8,9,10]. In addition, a strong correlation between BC molecular subtypes and /or tumor phenotypes on the one hand (determined by hormonal receptor and HER2 status) and axillary status on the other hand has been shown in numerous studies [11,12,13,14,15,16].

The determination of the risk of positive axillary LN can significantly contribute to therapeutic decisions. However, this risk cannot be immediately induced from the results of multivariate analyses that provide broad statistical information. Only an appropriate prediction tool, using a nomogram, can indicate the individual risk of a given patient. These nomograms can also be used to compare populations from different studies. A large cohort is necessary to reliably determine the probability of positive SN, particularly for less frequent tumor phenotypes. Reyal et al. published such a nomogram predictive of the risk of developing SN metastases in 2011 [11], built on a training set made of 1543 early-stage BC patients, and validated on two cohorts of 615 and 496 patients respectively. This model was further validated in a cohort of 755 consecutive patients treated at Institut Curie in 2009 [17].

The aim of our study was to develop and compare the performance of multivariable models to predict LN metastases, including nomograms derived from logistic regression with clinical, pathologic variables provided by tumor surgical results or only provided by biopsy as explanatory variables.

Methods

Patients

Our cohort consisted of 12,572 consecutive patients with small (≤ 30 mm based on clinical and radiologic findings), clinically node-negative invasive BC, who did not receive neoadjuvant therapy, and underwent SLNB between 1999 and 2012 at 13 French centers. HER2 status was determined for all patients. During the first years of the study, ALND was systematically performed in some sites; thereafter, ALND was performed only in case of SN involvement, this attitude being homogeneous within all the participating sites.

Evaluation

The following data were retrieved: characteristics of patients (age at the time of SLNB), and tumors [size, clinical stage, histological type, estrogen (ER), progesterone (PR) and HER2 status, LVI, Scarff-Bloom-Richardson (SBR) grade], description of ALND (number of LN sampled and involved), and results of the pathological examination of surgical resection specimens. Tumor size was determined on the results of pathological examination but could be evaluated pre operatively by mammography, sonography and in selected cases by MRI (clinical T stage). LVI was detected on surgical specimen.

Tumor phenotype was defined by the combination of ER, PR and HER2 status, evaluated by immuno-histochemistry (IHC) and confirmed by FISH in case of IHC-HER2 2+. Positivity for ER and PR was determined according to French guidelines (≥ 10% of cancer cells expressing ER/PR). Five molecular subtypes were defined according to clinico-pathological criteria [18]. Because information on Ki-67 was not available, we used grade to capture cell proliferation, as described by von Minckwitz et al [19] The following definitions were used: triple-negative (basal-like, HER2-/HR-), HER2 positive (non-luminal, HER2+/HR-), and luminal (HR+), divided into luminal A (HR+/HER2−/grade1 or 2), luminal B-HER2-negative like (HR+/HER2−/grade 3), and luminal B-HER2-positive like (HR+/HER2+ all grades).

Although the methods used for histological examination were not standardized in the protocol, all sites proceeded similarly: serial sections were performed every 200 μm and stained with standard hematoxylin and eosin. The number of sections was six to ten, or pursued until node exhaustion in case of large SN. Additional IHC analysis was done in case of negative results at standard examination. For additional nodes identified by completion ALND, routine HE analysis was performed.

Five categories of LN status were defined: negative LN (pN0i-), isolated tumor cells (pN0(i+): < 0.2 mm), detected either by hematoxylin and eosin (HE) staining or by cytokeratin IHC, micro metastases (pN1mi: > 0.2 mm and < 2 mm), and macro metastases (> 2 mm), divided into single and multiple macro metastases [20].

Statistical methods

Our main objective was to create prediction models for the risk of LN positivity and the risk of LN macroscopic metastases from clinical and pathologic variables provided by tumor surgical results or by biopsy, and evaluate their performance with respect to three main features: discrimination (i.e. whether the relative ranking of individual predictions is in the correct order), calibration (i.e. agreement between observed outcomes and predictions) and clinical utility defined as proportions of patients classified into risk categories using predefined cutoff values (< 10%, between 10 and 20%, between 20 and 30%, between 30 and 40%, and > = 40%). Our main evaluation criteria were based on the final status of LN metastases (pN0(i+), pN1mi or pN1ma) as the result of SLNB alone or the final result of both SLNB and ALND. LN positivity was defined as the presence of isolated tumor cells, micro or macro LN metastases. We used logistic regression models [21] including age (<=40, 41–75,> 75), tumor size (<=20, 20–30, > = 30 mm) or clinical T stage (T0-T1, T2, T3-T4), tumor grade, histology type, LVI, and molecular subtypes as predictor factors to predict each individual risks. The list of predictor factors was set beforehand, based on the investigator’s experience and some reference papers [6,7,8,9,10,11, 13,14,15, 17]. No additional procedure was used in regression analysis to reduce the list of only 5 or 6 predictor factors identified beforehand. Prior to analysis, we randomly divided our initial cohort (N = 12,572) in two separate sub-cohorts: a large training cohort (N = 8381) to create prediction models and a confirmatory cohort (N = 4191) to evaluate their individual’s prediction performance. A split-sample approach was adopted in order to estimate unbiasedly the model performance, as these estimates are known to be biased upwards when regression parameters are estimated on the same dataset [22]. First we performed a descriptive analysis using the following criteria: patient’s age at SN biopsy, clinical and pathological tumor size, tumor grade and histology type, lymphovascular invasion or not (LVI), presence of estrogen (ER), progesterone (PR) and hormonal receptors (RH), Her2 positivity, tumor subtype, number of SN removed and final LN status. The evaluation of each model was assessed in the training sample and the confirmatory sample. Differences in patient’s and tumor’s characteristics were compared using Chi Square or exact Fisher test, Student or Wilcoxon rank sum tests as appropriate. The discrimination ability was evaluated by the area under the ROC (Receiver Operating Characteristic) curve (AUC). We used the functions roc and pROC implemented in R to estimate AUC with 95%CI and test for difference in AUCs along the Delong’s method in the confirmatory sample [23]. Empirical distributions of AUC observed after re-fitting a model on bootstrap replicates (B = 2000) were used to estimate AUC and difference in AUCs with 95% Ci in the training sample. Model calibration was evaluated using Hosmer goodness-of-fit test [24]. All statistical analyses were conducted in the R Language and Environment for Statistical Computing version 3.2.5 (The R Foundation, Vienna, Austria).

Results

Patients’ characteristics

Patients’ main characteristics are summarized in Table 1. SBR grade was 1, 2 and 3 in 34, 46 and 20% of cases respectively. Hormone receptor-positive tumors (ER+ and/or PR+) accounted for 88% of cases (11,013 patients). Final LN status, taking into account ALND results when performed, was: pN0(i-) in 8253 patients (66%), pN0(i+) in 355 (3%), pN1mi in 970 (8%) and macro metastasis in 2994 (24%). The comparison between patients with positive and negative final LN status, and between patients with LN macro metastases versus pN0 or pNo(i+) or pN1mi showed statistically significant differences with regard to age, pathologic tumor size, SBR grade, LVI, histological type and distribution of molecular subtypes (Tables 2 and 3).

Table 1 Population: all patients and patients according to initial data set or validation set
Table 2 Initial data set and validation set results according to axillary nodal involvement
Table 3 Initial data set and validation set results according to axillary nodal macro metastasis involvement

We first predicted the individual probabilities of final LN positivity and of detecting LN macro metastases from selected clinico-pathologic predictor factors provided by tumor surgical results. The model AUCs with 95% CIs for confirmatory and training samples were respectively 0.767 [0.750–0.783] and 0.755 [0.744–0.767]. Calibration plot and Hosmer-Lemeshow test revealed that the calibration is adequate (p = 0.332 in confirmatory sample, p = 0.158 in training sample). With respect to clinical utility in confirmatory and training samples, the probability of positive LN were respectively below 10% for 7 patients (< 1%) and 19 patients (< 1%), between 10 and 20% for 1096 (31%) and 2255 (32%), and ≥ 20% for 2409 (68.6%) and 4859 (68.1%) patients (Table 4) (Fig. 1A, Additional file 1: Figure S1A and Additional file 2: Figure S2A). The second pathological model estimated the probability of detecting LN macro metastases only. The AUC values for confirmatory and training samples were respectively 0.798 (0.780–0.815) and 0.780 [0.767–0.790]. Clinical utility measures, estimated the probability of LN macro metastases respectively in confirmatory and training samples below 10% for 1004 patients (29%) and 2029 patients (28%), between 10 and 20% for 1075 patients (31%) and 2289 patients (32%), and >  20% for 1433 patients (41%) and 2815 patients (39.4%). The Hosmer-Lemeshow test revealed a poor calibration of the model (p = 0.024 in confirmatory sample, p = 0.427 in training sample) (Table 4 and Additional file 1: Table S1) (Fig. 1B, Additional file 2: Figure S1B, Additional file 3: Figure S2B).

Table 4 Discrimination, calibration and clinical utility measures of pathologic and pre-operative prediction models
Fig. 1
figure 1

Nomograms. 1a: Nomogram predictive of LN Involvement– Pathologic model. 1b: Nomogram predictive of LN macro metastases – Pathologic model. 1c: Nomogram predictive of LN Involvement– Clinical model. 1d: Nomogram predictive of LN macro metastases – Clinical model.

We evaluated the loss in discrimination ability in pre-operative prediction models omitting the information about LVI and substituting pathological tumor size information by clinical T stage. For the overall probability of LN positivity, the AUC values for confirmatory and training samples were respectively 0.687 [0.669–0.705] and 0.682 (0.669–0.694). For the probability of detecting LN macro metastases, the observed AUC results for confirmatory and training samples were respectively 0.727 [0.707–0.746] and 0.717 (0.703–0.732). The calibration of both pre-operative models was found satisfactory. (Table 4) (Fig. 1 C-D, Additional file 2: Figure S1C-D, Additional file 3: Figure S2C-D).

The change in AUCs between pathological and per-operative model were found statistically significantly decreased (p < 0.001). We also evaluated in the confirmatory sample the discrimination ability of the prediction models obtained when treating the variable age and tumor size as continuous. The AUC values for predicting LN positivity and the presence of LN metastases were respectively 0.774 [0.758, 0.79] and 0.805 [0.789–0.823]. The observed increases were significantly (p = 0.041 and p = 0.026), but the results in terms of calibration were judged inadequate (Hosmer-Lemeshow p value < 0.001).

Discussion

The aim of this study was to better understand the relationships between tumor characteristics and the probability of axillary LN positivity. The large cohort used in our study is appropriate for less frequent tumor phenotypes (namely Her2+ and HR-Her2-). We distinguished between various histological tumor types, showing a lower LN positivity rate in tumors other than ductal, lobular or mixt, as previously reported for BC with favorable histology (tubular, mucinous, papillary, medullary, adenoid cystic and secretory) that are associated with a very low LN positivity rate [25].

In our model, we used the same independent variables as Reyal et al. [11], namely age, tumor size, molecular subtypes and LVI, and we added grade and histological type. However, age intervals were different, as well as tumor phenotype definitions (ER only in the Reyal model) and tumor size description (continuous variable in the Reyal model). We obtained different odds ratios for the same variables and clinical utility results were different and higher for low probability of positive lymph node, particularly for macro metastases in our population for both models. Clinical utility results for low probability of positive lymph node could be contributive to avoid surgical axillary staging by sentinel lymph node biopsy or axillary lymph node dissection.

The models were less reliable when information about LVI was missing. LVI could be detected on pre-operative biopsies but the difference in accuracy is obviously large in comparison with surgical specimen analysis.

The HER2 status was unknown in old studies [8] and others studies were based on small number of patients. We found that HER2 negative tumors were associated with LN positivity less frequently than HER2 positive tumors (22.9% vs. 31.9%). Lu et al. published that the lowest probability of node metastasis was for ER- / HER2- tumors [12]. Similarly in our study, triple negative tumors had the lowest probability of node metastasis, while HR- / Her2+ tumors had the highest probability. Reyal et al. hypothesized that the axillary LN metastatic process is predominantly related to intrinsic biological properties in ER-negative and HER2-negative BC, while tumor size, proliferation rate and LVI are the main determinants in the ER positive or HER2 positive breast cancers. However, positive axillary lymph nodes in triple negative BC were pejorative prognostic factors for sentinel node macro-metastases but also for occult sentinel node involvement (pN0(i+) and pN1mi) [26].

A reliable predictive model of LN positivity, based on pathologic parameters, can be used to compare populations from different studies, particularly for trials with or without axillary surgical procedure. Above all, it might allow avoiding SN biopsy when the probability of positivity is very low (< 10%). Some authors already suggested that SN biopsy could be omitted in tumors with good-prognosis subtypes [25] or that axillary dissection is useless in older patients [27]. We believe that these criteria lack accuracy and we prefer a decision-making approach, based on molecular subtypes. However, we must be aware of the risk of insufficient treatment in small tumors with favorable prognostic factors, in which LN status is a major determinant of adjuvant chemotherapy and regional radiotherapy. Moreover, the model is less reliable when LVI is not documented, which is usually the case before surgery. Ultra-sonography of the axilla and percutaneous biopsy is a growing practice. These clinical predictive tools may be helpful relative to the use of axillary ultra-sonography with percutaneous LN biopsy for patients with high level risk of axillary LN involvement.

These models can also be contributive in order to determined indications of post mastectomy radiotherapy for patients with axillary lymph nodes macro-metastases [28], particularly when immediate breast reconstruction can be proposed.

Conclusions

We reported a reliable predictive model of LN positivity according to different early breast carcinoma phenotypes in a large cohort. The determination of the risk of positive axillary LN can significantly contribute to therapeutic decisions. These models, with or without LVI results, can also be used to determine the risk of positive axillary LN or the risk of LN macro-metastasis. Before surgery, clinical models can be used to propose SLNB or not according to LN involvement probability. After surgery, in case of SLNB omission, if LN involvement probability is high, with eventually modifications of adjuvant treatment indications according to LN status, a re-operation can be proposed (SLNB or cALND). Thus clinical and pathologic models should be helpful in surgical planning, in the setting of a clinical trial and in clinical practice to avoid SLNB for very low risk of LN involvement and to avoid re-operation in case of SLNB omission or to propose ALND for patients with high level probability of major axillary LN involvement but also to propose immediate breast reconstruction when PMRT is not required for.