Introduction

In nearly all cases, a woman who dies of breast cancer experiences a distant recurrence prior to her death. There are several steps between diagnosis and death—initially cancer cells must migrate from the breast to a distant site and establish a stable presence. Second, the cancer must proliferate in the metastatic niche or set up secondary metastases. The cancer must survive eradication by hormonal therapy and/or chemotherapy and be recalcitrant to a host immune response. One can also posit a dormant period wherein the cancer cells are stable in the metastatic niche, but are not proliferating; in this case, transition from a dormant to an active state is a further condition of fatality. The time from diagnosis to death varies from months to years and may be particularly long for women with ER-positive breast cancers [1]. It has been proposed that the long survival times for women with ER-positive cancers are the consequence of a prolonged dormant period, but this is unproven [1].

Previously, we used three complementary metrics to describe the time course of deaths in various groups of women with breast cancer [2]. The first is the annual mortality rate post-diagnosis, the second is the twenty-year Kaplan–Meier actuarial survival, and the third is the frequency distribution of times to death. Using these metrics, we described the heterogeneous nature of time-to-death according to patient subgroup and tumour sub-type [2].

Women with triple-negative breast cancers experience a peak distant recurrence rate at 1 year (15% per year) followed by a sharp decline that falls below that of ER-positive breast cancers at 5 years [3, 4]. The mortality rate for ER-positive breast cancer patients is stable between year 3 and year 20 [3]. If a woman has a small node-negative ER-positive breast cancer she is equally as likely to die in year 20 as in year three [2]. Unpredictability in time-to-death is an inherent property common to all breast cancer subgroups with low fatality rates [2]. We are able to calculate life expectancy at a group level but are unable to predict date of death at an individual level. We propose that variation in time-to-death can be explained by reference to the rate of transition of metastases from a dormant to an active state. We sought to determine whether tumour dormancy is a feature of all breast cancer cases or only those which are ER-positive.

Methods

Study subjects

We extracted data on all breast cancer cases from the Surveillance, Epidemiology and End Results (SEER) data set diagnosed from 1990 to 1999. Patient characteristics include year of birth, year of diagnosis, age at diagnosis, race and household income. Tumour characteristics include grade (well differentiated—I, moderately differentiated—II, poorly differentiated—III, undifferentiated/anaplastic—IV, unknown), tumour size, nodal status (N0, N1, N2, N3, unknown), stage (I, II, III), estrogen receptor (ER) (negative, positive) and progesterone receptor (PR) status (negative, positive, unknown). Cancer treatment variables include radiation (no, yes, unknown) and chemotherapy (no/unknown, yes). Information on tamoxifen use and on surgical procedures for the primary tumour were not available in SEER prior to 1998.

We excluded patients with a prior history of cancer, patients with DCIS, stage IV breast cancer or unknown stage. We excluded patients whose tumour size or estrogen receptor status was unknown. The final cohort consisted of 123,705 women with a first primary invasive breast cancer. The mean follow-up was 13.1 years and among alive patients 95% were followed 15 years or more. Patients were followed from breast cancer diagnosis to breast cancer-specific death, other cause-of-death, loss to follow-up, or 20 years post-diagnosis. The patient cohort is described in Table 1.

Table 1 Baseline characteristics of breast cancer patient cohort by ER status

Classifying probability of breast cancer death

We aimed to delineate a statistical relationship between the probability of death from breast cancer and time-to-death. The overall strategy was to generate a series of mortality curves for patient subgroups with varying risk profiles, defined by tumour and host factors. The probability of death from breast cancer was defined as the actuarial risk of dying of breast cancer by 20 years post-diagnosis.

Using all the available data, ten risk groups of equal size were constructed and ranked on their 20-year actuarial probability of death from the lowest risk (decile 1) to highest risk (decile 10). Survival was modelled using a disease risk score approach in combination with Cox regression analysis, with year at diagnosis, age, income, race, tumour size, stage, grade, nodal status, ER status, PR status, radiotherapy and chemotherapy as predictors. To account for potential nonlinearity of continuous variables, we modelled natural cubic splines for age at diagnosis, income and tumour size. The Breslow estimator was used to obtain cumulative baseline hazard functions and generate an actuarial probability of death from breast cancer at 20 years for each patient [5].

Modelling tumour dormancy

We asked if the differences in the mortality experience of the ten risk groups could be explained entirely by a model where the duration of tumour dormancy varied. Under this model, after reactivation from the dormant state, all cancers experience the same growth rate. We model dormancy as a single stochastic rate from dormant to active. We assume that all women in the same risk decile experience the same rate of reactivation, that reactivation occurs at random and that the annual reactivation rate is constant for the entire 20-year follow-up period. We define the tumour reactivation factor α as a scalar variable that represents the rate of cancer reactivation. The factor is assumed to be independent of time from diagnosis and is modelled as a Poisson process. The tumour reactivation factor can take on any value greater than zero; where an α value that approaches zero results in a low tumour reactivation, and an α value that approaches infinity results in an immediate reactivation. Quantitatively, the tumour reactivation factor represents the rate of cancers becoming reactivated per year of follow-up. For example, an α of 0.10 [year−1] would result in 10,000 of 100,000 cancers reactivated in any given year. This corresponds to a mean reactivation time of 10 years (1/α), with 63.2% of cancers being reactivated within 10 years and 86.5% reactivated within 20 years (\(1 - {\text{e}}^{{\left( { - 0.1 \times 20} \right)}}\)). We used the highest risk decile (decile 10) as the reference distribution to model tumour dormancy in the remaining deciles. We assumed that the women in the highest risk decile did not experience a period of tumour dormancy and that patients in risk deciles 9 to 1 experience tumour dormancy to an increasingly greater degree. Each cancer was initially dormant and then might or might not have been activated over the 20 year follow-up period. Normalized mortality rate distributions were generated for each value α, ranging from 0.01 to 10.0, by 0.01 increments, using decile 10 as the baseline reference group (no dormant period).Footnote 1 We then sought to identify, for each decile, the single value of α which generated a theoretical normalized mortality rate curve which most closely corresponded to the empiric distribution. To measure goodness of fit, the Mean Square Error (MSE) for each α value was calculated, by comparing to the true distribution observed in each decile.

Next we aimed to identify a fitted value c for each decile which when multiplied by the normalized mortality rates will result in the observed 20-year breast cancer specific mortality. The multiplication factor (c) in combination with the tumour reactivation factor (α) are used to generate predicted mortality rates, Kaplan–Meier curves, and distributions of time-to-death. Fitted values for c were estimated using an iterative approach similar to what was done for α estimation.

Identifying predictors of tumour dormancy

After exploring the relationship between risk of breast cancer death and tumour dormancy, we sought to determine independent predictors of tumour dormancy. To do this, we performed quantile regression to model the median time-to-death among women that died from breast cancer. Predictors of time-to-death include year of diagnosis, age, ethnicity, tumour grade, nodal status, ER status, PR status, radiotherapy and chemotherapy. Inverse probability of censor weights was incorporated into the quantile regression to account for differences in follow-up time between subjects (see supplemental methods).

Results

We divided 123,705 women into ten risk deciles based on the variables in the SEER database. The patient and tumour characteristics for each of the ten risk groups are shown in Table 2. The mean probability of death (mortality) for the 12,370 women in the lowest decile was 5.1% and for the 12,370 women in the highest decile was 69.2%. The 20-year Kaplan–Meier survival curves are presented in Fig. 1.

Table 2 Baseline characteristics of breast cancer cohort by decile of risk
Fig. 1
figure 1

Kaplan–Meier curves observed for each risk decile of breast cancer patients

We sought to generate survival curves using the dormancy model, assuming patients in decile 10 (the reference group) experienced no dormancy and all other deciles varied only in the duration of tumour dormancy. That is, the differences in survival curves between risk groups depended only on differences in the tumour reactivation factor (α) and multiplier (c). Details on the generation of survival curves and distributions are available in the supplemental methods. We compared the observed and modelled curves for time-to-death for each of the ten deciles. The fit was good by inspection (Fig. 2a, b).

Fig. 2
figure 2

Observed and predicted survival metrics for each decile using optimal α and c value. Time-to-death distributions for observed (a) and predicted (b) deciles, Kaplan–Meier survival curves for observed (c) and predicted (d) deciles, and biannual mortality rates and distributions for observed (e) and predicted (f) deciles

Using α and c, we generated modelled curves for annual mortality and Kaplan–Meier survival and compared these with the actual curves for the ten deciles (Figs. 2c, d and 2e, f). The fitted values for α and c for each decile are presented in Table 3. The relationship between tumour dormancy factor (α), c value and 20-year breast cancer mortality across the nine deciles is presented graphically in eFigure 1.

Table 3 Factors related to breast cancer mortality and time-to-death in risk subgroups (deciles)

We conducted an analysis using a quantile regression model which sought to identify factors which predicted time-to-death. The outcome was change in median time-to-death for a subgroup of breast cancer patients. The results of the regression model are presented in Table 4. The median time to death among all breast cancers was 5.67 years. On average, women with ER-positive breast cancers had a median time-to-death 1.50 years longer this. Women with ER-negative breast cancers had a median time to death 2.17 years shorter. The main effect of ER-status was attenuated after adjusting for other risk factors correlated with ER status. In the multivariable analysis, independent predictors of time-to-death include ethnicity, tumour size, grade, nodal status and PR-status (Table 4). Neither radiotherapy nor chemotherapy was a clinically significant predictor of median time-to-death.

Table 4 Predictors of time-to-death in breast cancer patients

Stratification by estrogen receptor status

We asked if the phenomenon of tumour dormancy was a feature of all breast cancers, of ER-positive breast cancers only or of all breast cancers with low mortality. We divided the 123,705 cases of breast cancer into ER-negative and ER-positive and repeated the steps described above. Fitted values of tumour dormancy by decile for ER-positive and ER-negative patients are presented in eTable 1a and eTable 1b. Observed and modelled time-to-death curves for these subgroups are presented in Fig. 3. Kaplan–Meier survival curves and biannual mortality rates for these subgroups are presented in eFigure 2 and eFigure 3. Independent predictors of time-to-death for ER-specific regression models are presented in eTable 2.

Fig. 3
figure 3

Observed and predicted time-to-death distributions for ER-positive patients (a, b) and ER-negative patients (c, d)

For ER-negative breast cancer patients the curves were similar by decile and the time-to-death did not increase as steeply as for ER-positive patients. The time corresponding to peak mortality rate ranged from 1.5 years in decile 10 to 4.0 years in decile 1. In contrast, for ER-positive breast cancers, the lower risk deciles showed a protracted time-to-death distribution. For women in the highest risk decile, the peak mortality rate was observed at 3.0 years post-diagnosis whereas for women in the three lowest mortality groups, the peak mortality rate time was in excess of 17 years. We estimated the various values of α for ER-negative and ER-positive breast cancer for each decile. For women with ER-negative breast cancer, α was less than 1.0 only among deciles 1 to 4, suggesting the importance of tumour dormancy in these low risk subgroups. For ER-positive breast cancers, the fitted α value ranged from 0.14 for decile 1 to 1.00 for decile 9.

We wished to assess the general applicability of the model for various subgroups categorized by features other than ER status, including nodal status, tumour grade and tumour size. For each of 30 patient subgroups defined by various combinations of these factors, we obtained the observed mortality and we predicted α and c values using the regression relationships presented in eFigure 1. We calculated the MSE (observed versus predicted) of the normalized mortality rates for all subgroups. The results are summarized in Table 5.

Table 5 Observed survival metrics, predicted α and c value, and fit of dormancy models among predefined patient subgroups

Discussion

We used a data set of 123,705 breast cancer patients with up to 20 years of follow-up to perform a pattern analysis of time-to-death according to various combinations of prognostic factors. The large sample size and long duration of follow-up accorded us the opportunity to examine mortality rates and times-to-death at a resolution that was not previously possible. In an earlier analysis of the same SEER data set, we showed that a general property of breast cancer patients is that the higher the annual risk of death, the greater the proportion of deaths that occur in the first 5 years [2], e.g. 62% of deaths from grade III breast cancers occur in the first 5 years, only 23% of the deaths from grade I breast cancers occur in the first 5 years.

Using a modelling approach, we ask if the systematic differences observed in the time-to-death and the distinct patterns of the survival curves that we described in 2018 (ref 2) can be accounted for by variation in the duration of an early dormant period. Further, we ask if tumour dormancy is restricted to ER-positive breast cancers. We generated hypothetical survival curves under a simple dormancy model and compared these with the empiric SEER survival curves. We sought to replicate the actual data by incorporating variables which correspond to the probability of metastases being present at diagnosis and the rate of re-activation from tumour dormancy. These two variables, c and α, respectively, are used successfully to model mortality and time-to-death across 30 patient groupings. Overall, the fit was very good when the two variables were used in combination to model survival differences between the various propensity classes. For ER-negative breast cancers, most values of α were relatively large and for these the predicted dormant periods were short. For example, an α of 1.0 corresponds to a mean dormant period of one year and values above 1.0 correspond to even shorter dormant periods. Prolonged dormancy was a dominant feature in ER-positive cancer patients and the variation in mortality and time-to-death can be predicted to a large extent by the variation in the rate of tumour reactivation. Thus, it appears that significant variation in tumour dormancy (α ≤ 1) is present among low risk ER-negative patients (decile 1 to 4) and all ER-positive patients (deciles 1 to 9).

Here we assume that tumour reactivation is a random event and that within a subgroup, the annual rate of reactivation does not change during the follow-up period. However, between subgroups, the fitted values for rates of reactivation vary markedly; for women in the bottom decile of risk, we estimate the annual reactivation rate to be 14% annually and we expect that 93.9% of cancers will reactivate over 20 years. That is, even if an ER-positive cancer is metastatic at diagnosis, there is a possibility it will not reactivate within 20 years. For women in the top decile of risk, we estimate that almost all cancers will become active within one year.

We performed multivariable quantile regression to determine which variables are most important in predicting the length of tumour dormancy. Strong independent predictors of tumour dormancy included tumour size, grade, nodal status and PR status (Table 4). Race is also important in that black women experience an earlier time-to-death than white women, whereas east Asian women experience a median time-to-death approximately one year later than white women. Interestingly ER-negative status (as compared to ER-positive status) had a median reduction in time-to-death reduction of only 1.7 years; this is less than what we would expect if ER status was the primary driver of tumour dormancy. The adjusted ER effect was smaller than the crude ER effect—this can largely be explained by highly correlated tumour factors within each ER subgroup.

The striking implication of our model is that the lower the mortality rate, the more unpredictable the time-to-death; for example, for women with ER-positive, node-negative, grade I/II breast cancers of less than or equal to 2 cm, the annual mortality rate was almost constant over the five to twenty-year follow-up period (eFigure 4). The interval from 3.7 years to 18.2 years contained 80% of the deaths (Table 5). If a woman has a cancer of this type, her physician may tell her that she has a 8.9% chance of dying of breast cancer in 20 years and that on average, death will occur at 11.1 years. But she is equally as likely to die in year five as in year 20. The prolonged time-to-death of ER-positive breast cancer patients is now well recognized and others have suspected that this is due to tumour dormancy, but have not demonstrated this in a formal way. [1, 6].

There are several strengths to this analysis. We followed the patients for 20 years, and this is ample time to generate characteristic survival curves for various subgroups. It can be seen by inspection of the figures that a five- or ten-year follow up period is insufficient to speculate on the natural history of breast cancer. For example, in evaluating the performance of the BIC score in predicting recurrence in ER-positive breast cancer patients, Sestak et al. define early recurrence as < 5 years and late recurrence as 5–10 years [6]. In our data set, for women with ER-positive breast cancer in the two lowest risk deciles of risk, the peak mortality rate was not reached until 20 years after diagnosis.

There are several weaknesses of our study. Most importantly, the SEER registry does not provide information on tamoxifen or other anti-hormonal therapies. The patients were diagnosed between 1990 and 1999, and we expect a high proportion of the ER-positive breast cancers will have received tamoxifen. Further, tamoxifen is expected to increase the time-to-death [2] but this has not been captured in the current study. Ideally we would present separate mortality curves for women with and without tamoxifen. Future studies should incorporate antihormonal therapy where available.

There is no standard definition of tumour dormancy and most of the literature is based on animal models. In this paper, we interpret tumour dormancy as a state of inactivity, that is the cells in the metastatic niche are not increasing in number and are not generating further metastases. We do not distinguish between no cell division or a balance between cell division and cell death. At present the state of the science does not permit a formal definition of tumour dormancy and this is an area under exploration.

We have modelled our mortality curves under the assumption that, after reactivation, time-to-death is similar for all ten prognostic groups (deciles). This may not be the case; there is little epidemiologic evidence to support or to refute this assumption. In any case, we show that it is not necessary to propose variation in growth rates of various classes of tumour post-reactivation and that the empiric SEER time-to-death curves can be explained by a model which is based entirely on variation in the rate of activation.

Time from metastases to death can be divided into (1) time from metastases to activation, (2) time from activation to distant recurrence, and (3) time from distant recurrence to death. In an ideal situation, we would be able to study time from activation to distance recurrence independent of the other periods, but time of dormancy/activation transition is in-observable. Further the data of first metastatic spread is in-observable and we use the data of diagnosis as a surrogate for this. However, in a recent study using in-house data from our breast cancer follow-up clinic we studied predictors of the time from distant recurrence to death. Interestingly we found no significant predictors of time from distant recurrence to death among 336 ER-positive breast cancer patients. Among ER-negative patients (N = 175), a high tumour grade and a short time from diagnosis to distant recurrence were associated with a rapid time-to-death [7].

It has been proposed that chemotherapy would not be effective in treating dormant cancers. If so, we would expect early administration of chemotherapy to have limited value for low risk ER-positive cancer, because most would be in a dormant state for the first few years. In assuming an α of 0.63 among a subgroup with predominant chemotherapy use (ER-positive, decile 7), only 27% of cancer cases will emerge from dormancy in the first 6 months from diagnosis. This is the time frame when chemotherapy is given. If chemotherapy were not effective against dormant tumours, we would not expect the benefit of chemotherapy to be present and we would expect the relative benefit to be greater for high-risk ER-positive cancer than for low-risk ER-positive cancers. This is not the case, in the large collaborative study of ER positive breast cancers, the benefit of chemotherapy in terms of hazard ratio was similar for ER-negative and ER-positive cancer and across categories defined by grade and nodal status [8].

In conclusion, we propose that the lower the risk of death from breast cancer the more prolonged is the time-to-death distribution and the more unpredictable the clinical course. We propose that the systematic differences in time-to-death are due to differences in the period of dormancy in the initial course of the cancer. We conclude that median time-to-death is especially prolonged among ER-positive and low-risk ER-negative cancers, and the mean duration of tumour dormancy can be predicted by tumour factors such as grade, tumour size, nodal status and PR status. Large clinical epidemiology studies of women with ER-positive breast cancer may determine whether emergence from dormancy is influenced by host factors and environmental exposures or is a purely random event.