Introduction

Breast cancer is the most frequent malignancy and the leading cause of cancer-related death in women, representing approximately 25% of all malignancies and 15% of cancer-related deaths in the world [1, 2]. Although most patients are diagnosed with stage I-III disease, up to one-third will eventually develop distant recurrence [3]. The risk and timing of recurrence is influenced by patient and tumor characteristics as well as the receipt of appropriate local and systemic treatments [4,5,6,7,8]. Recurrences of breast cancer can occur both early (within 5 years of diagnosis) and late (more than 5 years from diagnosis). Quantifying the risk of early and late recurrences is a topic of major interest; hormone receptor (HR) status appears to be one of the strongest factors influencing risk. Early recurrences predominate in patients with HR-negative breast cancers, whereas late recurrences are common in patients with HR-positive tumors [9, 10]. Multiple tools are being evaluated to stratify the risk of early versus late recurrence, including risk calculators based on traditional clinicopathologic factors and gene expression-based assays [11,12,13,14,15,16,17,18].

However, most studies reporting on the risk of late recurrence in breast cancer have been based on selected patients enrolled into clinical trials, and population-based assessments of the risks of long-term breast cancer-specific mortality (BCSM) are lacking [9, 10, 19]. Further, studies evaluating this phenomenon have focused primarily on HR-positive breast cancer and less is known about longer term risk in HR-negative disease. The aims of this study were to report on population-based long-term risks of BCSM across HR-positive and HR-negative subtypes, and the risks of BCSM conditional on having survived 5 years. In addition, we aimed to identify factors associated with late deaths from breast cancer. Having more accurate estimates of the risks of late recurrence and long-term BCSM may inform clinical counseling in current breast cancer survivors and support the conceptual and statistical design of clinical trials focused on mitigating the risk of late recurrence.

Patient and methods

Data source and study design

We used data from the Surveillance, Epidemiology, and End Results (SEER) 18 registry (1973–2015) database [20]. We included women diagnosed with a first confirmed invasive breast cancer between 1990 and 2005. Because data on estrogen receptor (ER) and progesterone receptor (PR) status has been recorded since 1990, this was chosen as the initial year of diagnosis for inclusion. Given our aim of analyzing long-term risks of BCSM, we chose to include patients up until 2005, in order to allow for a minimum of 10 years of potential follow-up time.

To keep our cohort consistent with a recent meta-analysis [19], we included patients presenting with T1a, T1b, T1c, or T2 tumors; N0, N1, or N2; M0. Tumor size (T) and nodal status (N) were registered according to the American Joint Committee on Cancer staging system sixth edition. In addition, in order to most clearly differentiate HR status, we included only patients whose ER and PR status were coded as ‘positive’ or ‘negative’, and did not include cases coded as ‘borderline’ or ‘unknown’. We excluded patients who did not undergo surgery for their primary tumor (n = 1375), those with no or unknown regional nodal examination (n = 13,344), unknown number of positive lymph nodes (n = 549), or unknown cause of death (n = 1,576). In order to provide a more accurate reflection of outcomes attributable to the index cancer, we also excluded patients who had more than one primary malignancy over their lifetime (n = 106,474). Supplemental Figure 1 shows the inclusion and exclusion criteria for the patient population. The final sample size was 202,080 patients.

We evaluated the following variables as defined and categorized in Table 1: age at diagnosis, race, year of diagnosis, histology, tumor grade, tumor size, ER, PR, type of breast surgery, number of lymph nodes examined, number of positive lymph nodes, marital status, cause of death, and vital status. SEER reports four tumor grades, which we consolidated into three categories: grade I (well differentiated), grade II (moderately differentiated) and grade III/IV (poorly differentiated or anaplastic). Using the information from ER and PR status, we created a variable called hormone receptor (HR) which we categorized as positive (when either ER or PR was positive) or negative (when both ER and PR were negative).

Table 1 Patient characteristics

Given that only deidentified data were used for this study, the study was considered exempt from Dana-Farber Cancer Institute’s Institutional Review Board review.

Statistical analysis

For each variable, we excluded patients with unknown data from the comparative analysis. The primary endpoint for the study was BCSM, using SEER cause-specific death classification, and defined as the interval from initial breast cancer diagnosis to death from breast cancer or last follow-up for censored patients. Deaths due to causes other than breast cancer were considered as non-breast cancer-specific mortality (non-BCSM). We used Kaplan–Meier analyses to determine the effect of baseline variables as defined in Table 1 on cumulative risks of BCSM and non-BCSM, starting from year 5 after initial diagnosis. A Log-rank test was used to evaluate the difference between groups. We estimated the annual rate of events per 100 person-years. Fine and Gray regression was used to evaluate the association of multiple pre-specified variables with the risk of BCSM, when considering deaths from other causes as competing events [21]. Pre-specified variables included were T, N, age at diagnosis, race, tumor grade, HR status, and marital status. Given the long study period, we also included year of diagnosis in the multivariate models. We checked the proportional hazards assumption for each variable using Schoenfeld residual plots and identified that HR status violated the assumption whereas the other variables did not, therefore, the Fine and Gray model was stratified by HR status. All P values reported were two sided and P values < 0.05 were considered statistically significant. All statistical analyses were performed using STATA 12.0 (Stata Corporation, College Station, TX) and SPSS 20.0 (IBM Corporation, Armonk, NY).

Results

Patient characteristics

A total of 202,080 patients met inclusion/exclusion criteria for the final analytic cohort. Forty-five percent of cases (n = 91,505) were diagnosed between the years 1990 and 2000 and 55% (n = 110,575) were diagnosed between 2001 and 2005. Table 1 shows the distribution of patient characteristics according to nodal status. Overall, most patients were age d ≥ 50 years (71.9%, n = 145,300) and white race (84.1%, n = 169,883). The predominant tumor size was T1c (43.2%, n = 87,216) and 67.9% of patients (n = 137,122) were N0. Most patients (56.1%) underwent partial mastectomy. A total of 79.8% of patients (n = 161,290) had HR-positive tumors. Median age for HR-positive patients was 59 years (range, 17–102 years) and for HR-negative patients was 54 years (range, 18–99 years).

Cumulative risk of BCSM

After a median follow-up of 14.17 years (IQR, 11.83 and 17.42 years), we observed 70,914 deaths of which 42.3% (n = 30,013) were due to breast cancer.

Among all breast cancer-related deaths, the proportion that occurred after 5 years from diagnosis was 65% for HR-positive breast cancer vs 28% for HR-negative breast cancer (P < 0.001). The distribution of breast cancer-related deaths over time is shown in Table 1. A total of 176,713 women were alive after 5 years from diagnosis and were included in the analysis of BCSM from year 5. Of these, 32,345 (18.3%) were HR-negative.

The cumulative risk of BCSM from year 5 after initial breast cancer diagnosis according to N and HR status is shown in Fig. 1a, b and Table 2. The cumulative risk of BCSM from year 5 to year 20 ranged from 7.9% in HR-negative N0 breast cancer to 38% in HR-positive N2 breast cancer. For each nodal status group, the risk of BCSM at 20 years was higher for patients with HR-positive tumors compared with patients with HR-negative tumors (for N0: 9.9% v 7.9%, P = 0.17; for N1: 21.9% v 12.2%, P < 0.0001; for N2: 38% v 19.9%, P < 0.0001).

Fig. 1
figure 1

Unadjusted risk of breast cancer-specific death according to nodal status (among patients with a HR + breast cancer and b HR− breast cancer) and tumor size (among patients with c N0 HR + breast cancer and d N0 HR− breast cancer) starting from year 5 after diagnosis. HR− hormone receptor negative, HR + hormone receptor positive, N nodal status, T tumor size

Table 2 Association of nodal status, tumor size, and grade with cumulative risk of breast cancer-specific mortality and non-breast cancer-specific mortality from years 5 to 20

The impact of tumor size by HR status on cumulative risk of BCSM among patients with N0 disease is shown in Fig. 1c, d and Table 2. The cumulative risks of BCSM from year 5 to year 20 for T1a and T1b HR-positive tumors were 4.6% and 5.9%, respectively, and 10.1% and 16.8% for T1c and T2 HR-positive tumors. In HR-negative disease, the cumulative risk of BCSM during the same period of time ranged from 6.1 to 8.4%. The analyses of tumor grade by HR status among patients with N0 disease are shown in Supplemental Figure 2a, b and Table 2. The cumulative risks of BCSM from year 5 to 20 among HR-positive disease with grade I, grade II, and grade III/IV were as follows: 5.7%, 9.9%, and 13.5%, respectively. Among patients with HR-negative disease, the cumulative risk of BCSM during that time ranged from 6.6 to 10.8%.

Table 3 shows the multivariate Fine and Gray analyses stratified by HR status. Among patients with HR-positive breast cancer, independent prognostic factors for BCSM conditional on having survived 5 years included tumor size, nodal status, age at diagnosis, race, tumor grade, marital status, and year of diagnosis. In contrast, among patients with HR-negative breast cancer, independent prognostic factors for BCSM conditional on having survived 5 years included tumor size, nodal status, age at diagnosis, and year of diagnosis, but race was not significantly associated with breast cancer survival outcomes.

Table 3 Multivariate analysis for breast cancer-specific mortality starting from year 5 after diagnosis stratified by hormone receptor status

Annual rates of BCSM

We evaluated annual rates of BCSM for the period of time from initial diagnosis to 20 years after diagnosis, divided into 5 year intervals. The results for each nodal status category by HR status are shown in Fig. 2a, b and Table 4. Among patients with N0 HR-positive breast cancer, we observed a constant rate of annual events throughout the years 0 to 20. Patients with N1 and N2 HR-positive breast cancer reached a maximum rate of events at year 5–10, followed by a decrease up to year 20. In contrast, for each nodal status category within HR-negative breast cancer, annual rates of BCSM peaked within the first 5 years from diagnosis, followed by a sharp decline up to year 20. Overall, the annual rates of BCSM from year 5 to 20 were lower for patients with HR-negative breast cancer compared with those for HR-positive breast cancer.

Fig. 2
figure 2

Annual rate of breast cancer-specific mortality according to nodal status among patients with a HR + breast cancer and b HR− breast cancer from the time of initial diagnosis. HR− hormone receptor negative, HR + hormone receptor positive, y years

Table 4 Annual rate of breast cancer-specific mortality by HR and nodal status from years 0 to 20, divided in 5 year intervals

Cumulative risk of non-BCSM

We evaluated non-BCSM for various subgroups of patients and estimated cumulative risk from year 5 to 20 (Table 2, Supplemental Figure 3a-c). Compared with the risk of BCSM from years 5–20, the risk of non-BCSM during the same period was approximately 3 times higher in both N0 HR-positive and N0 HR-negative breast cancer. The risk of non-BCSM was 1.2 and 1.6 times higher than the risk of BCSM in patients with N1 HR-positive and N1 HR-negative disease, respectively. Patients with N2 HR-negative breast cancer had similar risk of BCSM and non-BCSM from years 5–20, whereas patients with N2 HR-positive disease had a relative 30% lower risk of non-BCSM than BCSM.

Discussion

Deaths beyond 5 years from initial breast cancer diagnosis remain a significant clinical challenge. Our study was designed to evaluate population-based long-term risks of BCSM for both HR-positive and HR-negative breast cancers. In addition, we report the risk of non-BCSM, the risk of BCSM conditional on having survived 5 years, and the factors associated with late breast cancer deaths.

In HR-positive breast cancer, the risk of BCSM was constant for node-negative patients with an annual rate of events around 0.7%. The risk was significantly higher in node-positive patients, in whom the annual rate of events was not constant over time. Patients with N1 and particularly N2 disease experienced a higher rate of BCSM within the first 10 years from diagnosis and consequently a lower rate after 10 years. However, despite this lower risk beyond 10 years, node-positive patients still remained at a higher risk of BCSM even after 10 years from diagnosis when compared with their node-negative counterparts, with annual rates of events that were twice higher for N1 and three times higher for N2 than in node-negative patients. The population-based source of these data provides robust and clinically valuable estimates of BCSM in various subset of patients with HR-positive breast cancer. A recent meta-analysis of 62,923 women with HR-positive breast cancer, who were disease-free after 5 years of planned endocrine therapy, reported that the risks of distant recurrence from 5 to 20 years were 15% for N0, 23% for N1, and 38% for N2 [19]. In our study, the risks of BCSM from 5 to 20 years were 9.9%, 21.9%, and 38% for N0, N1, and N2, respectively. Similar results were reported in two prior studies, although with smaller sample sizes and shorter follow-up [22, 23]. Our study showed that most breast cancer deaths in HR-positive breast cancer occur after 5 years from initial diagnosis (65%), whereas only 28% of breast cancer deaths occur during the same time period in patients with HR-negative disease. This is likely due to the longer survival post-distant recurrence of HR-positive breast cancer compared with HR-negative disease [22].

In HR-negative breast cancer, the risk of BCSM was highest within the first 5 years from diagnosis, regardless of nodal status. However, beyond 5 years, the risk of BCSM remained important. For example, for patients with N2 disease, the absolute risk of BCSM between years 5 and 20 was 19.9%. There is a paucity of data on long-term outcomes for patients with HR-negative disease, and our study provides essential risk estimates for this subset of breast cancer. A pooled analysis that included 1148 ER-negative breast cancers showed that the annual hazard of recurrence was higher for ER-negative disease than for ER-positive disease within the first 5 years from diagnosis, and that beyond 5 years, patients with ER-positive breast cancer had higher annual hazard of recurrence than those with ER-negative disease [9]. A retrospective analysis of 873 patients with triple-negative breast cancer who remained disease-free at 5 years from diagnosis had a 15 year relapse-free interval of 95% [24].

The analysis of non-BCSM from our study helps to integrate the risks of BCSM into the perspective of the overall risk of death for various groups of patients. In patients with lower-risk tumors such as T1a or T1b, N0, HR-positive breast cancer, the risk of BCSM from 5 to 20 years was approximately 6 times lower than the risk of non-BCSM. Patients with N1, HR-positive breast cancer had similar risks of BCSM and non-BCSM from 5 to 20 years. In contrast, in patients with higher-risk tumors such as N2, HR-positive breast cancer, the risk of BCSM from years 5 to 20 was 43% higher than the risk of non-BCSM. These results could assist in the refinement of guidelines for treatment and follow-up of breast cancer patients that take into account both breast cancer risk and competing risks for mortality.

The multivariate Fine and Gray analyses stratified by HR status aimed to identify specific factors associated with BCSM conditional on having survived 5 years, while accounting for competing risks of death. Tumor size and nodal status were independently and strongly associated with BCSM both in HR-positive and in HR-negative breast cancer. Patients age 65 years and older had worse BCSM regardless of HR status, a finding that is consistent with results from a recent study [25]. The multivariate analysis also showed a significant improvement in BCSM over time, with a hazard ratio of 0.94 per year in both HR-positive and HR-negative breast cancer. This may be due to improvements in screening and early breast cancer detection, and advances in systemic therapy over the study period. Overall, our results underscore the role of traditional clinicopathologic factors, which remain strongly associated with BCSM even beyond 5 years from breast cancer diagnosis. While this was accepted in HR-positive disease, our study confirms that tumor size and nodal status also influence BCSM beyond 5 years in HR-negative breast cancer.

Our study has a number of important strengths. To our knowledge, this is the largest population-based analysis conducted to date on the risks of long-term BCSM with estimates at 20 years from initial diagnosis. The inclusion of HR-negative breast cancer allows us to provide risk estimates for that subgroup which has been understudied. The analysis of factors associated with BCSM conditional on having survived 5 years provides a unique opportunity to analyze these associations in clinically relevant subgroups at risk of late events. Finally, the population-based design of our study confers strong external validity to our results.

We also acknowledge a number of study limitations. First, we do not have data on the use of systemic therapies or adherence to treatment, which substantially impact outcomes. However, our data are consistent with the results for women with HR-positive breast cancer in a meta-analysis of patients treated in clinical trials [19]. Another limitation is the lack of information about HER2 status. Although adjuvant anti-HER2 therapies did not become standard until 2005, trastuzumab became standard therapy for metastatic HER2-positive breast cancer around 1999. Thus, some of the late BCSM we observed in HR-positive and in particular in HR-negative cases may be due to HER2-positive disease treated with trastuzumab with an anticipated longer survival than those with triple-negative disease. The long-term follow-up of the pivotal adjuvant trastuzumab trials showed recurrences in HER2-positive cases between years 5 and 10, and rates of late recurrence in HER2-positive disease are higher in HR-positive/HER2-positive subtype [26].

Further, SEER does not collect information on cancer recurrences, which would have provided additional risk estimates if available. The risks reported in our study may not accurately reflect the risks of patients diagnosed with breast cancer in more recent years, as the prognosis of breast cancer has improved in recent years with earlier diagnosis and better treatment modalities.

In conclusion, we observed that for HR-positive breast cancer, risks of BCSM remain high beyond 5 years from diagnosis and depend on tumor size, nodal status, age, race, and tumor grade. For HR-negative breast cancer, risks of BCSM are highest within 5 years from diagnosis; however, risks beyond 5 years are still important and depend on tumor size, nodal status, and age. Our results provide essential estimates of late BCSM events that could be used in the statistical design of clinical trials evaluating late endpoints, both in HR-positive and HR-negative breast cancer. We identified a group of lower-risk patients in whom the risk of BCSM was 6 times lower than the risk of non-BCSM, and a group of higher-risk patients in whom the risk of BCSM was higher than the risk of non-BCSM. Our results underscore the need for better therapies and strategies to prevent early and late recurrences in both HR-positive and HR-negative breast cancer.