Background

A locoregional recurrence (LRR) has a high risk of distant metastasis, and thus confers a poor prognosis [1]. LRRs are defined as the reappearance of breast cancer on the same site as the primary tumour, in the chest wall or ipsilateral, infraclavicular, supraclavicular or parasternal lymph nodes after curative treatment [2]. Factors that influence the risk of recurrence include tumour size, age, vascular invasion, multifocality, histological grade, hormone receptor status and treatment of the primary tumour [313]. Regular follow-up is aimed at detecting LRRs in an early stage to improve survival [14]. In the Netherlands, patients are followed clinically for at least 5 years after their treatment. Still, most of the recurrences are detected by the women themselves in between follow-up visits and some are detected after the 5 years of clinical follow-up [15, 16]. In a Dutch multicentre study, Geurts et al. [14] found that only 34 % of the LRRs were detected asymptomatically during routine visits. Due to the increase in survival, the burden of follow-up on health care is rising. Even though the risk factors are known, follow-up is the same for all patients and not dependent on the personal risk of the individual breast cancer patient. Since 2012, the national guideline of the Netherlands recommends an individualised follow-up by shared decision making, but does not provide recommendations on how to effectuate it. To achieve this, good insight into time-dependent individual LRR risk is necessary.

Statistical models that are used for predicting the outcomes of patients are called prognostic models. Many prognostic models appear to be adequate at the population level. However, their use to predict risks on the level of the individual patient is questionable. Patients and clinicians need accurate risks on the individual patient level to reach more informed and uniform decision making. Challenges are incomplete knowledge on causality and the existence of various risk factors with only a small effect [17, 18]. For the prediction of breast cancer, the first model was developed by Gail et al. [19]. This model, as well as other well-known models (e.g. BRCAPRO, BOADICEA [20], [21]) is aimed at predicting the general risk of primary breast cancer. To get towards personalised follow-up, models predicting LRRs are required. In this paper, logistic regression is used to calculate the risks. Not only the single risk estimated for the overall follow-up period of 5 years, but also the annual time-dependent risk. To facilitate uptake in clinical practice, ease of use and accessibility are crucial. This can be achieved by using a nomogram: a graphical representation of the underlying model. Our aim is to develop and validate a time-dependent logistic regression model and nomogram suitable for the annual risk prediction of LRRs in individual breast cancer patients. Knowing this individual risk could facilitate the decision on a personalised follow-up plan.

Patients and methods

Study population

Patients were selected from the Netherlands Cancer Registry (NCR), a nationwide population-based registry, which records all newly diagnosed tumours since 1989. The information on patient, tumour and treatment characteristics, as well as data concerning recurrences within the first 5 years following primary breast cancer were recorded from the patient files by specially trained registration clerks.

Women diagnosed with primary invasive breast cancer between 2003 and 2006 without distant metastasis, previous, or synchronous tumours (diagnosed within 3 months after the first tumour [22]), treated with curative intent and without neo-adjuvant systemic treatment were selected from the registry (n = 37,230). Curative intent was defined as surgical removal of the primary tumour without macroscopic residual disease. Adjuvant treatment should have been received in case of microscopic residue. In the first 5 years following primary breast cancer treatment, 950 (2.6 %) of the selected patients developed a LRR as a first event. For external validation, data were used of a cohort of 12,308 patients from a selection of Dutch hospitals (43 out of 91) that developed their primary breast cancer between the years 2007 and 2008. Of these patients, 275 (2.2 %) were diagnosed with a LRR.

Although second primary breast cancers (any epithelial breast cancer with or without lymph node metastasis in the contralateral breast [2]) are also of interest with regard to follow-up care, they are not included in the model. Second primary tumours are a different entity from the primary tumour, and are hard to predict based on the available clinical variables [2325]. Patients with a known genetic predisposition (estimates vary between 3 to around 7 % [2628]) are not part of the regular follow-up. Unless they underwent a double mastectomy, they undergo a separate, more intensive follow-up.

Model development

Variables were selected based on literature and availability of the data. As the effect of age on LRR risk is nonlinear, it was discretized into four groups (<50, 50–59, 60–69, ≥70). The patient, tumour and treatment characteristics shown in Table 1 were assessed for their influence on recurrence risk using multivariable binary logistic regression analysis. By means of backward elimination, we deleted variables from the initial model until only variables with a P value of <0.157 (Akaike information criterion) were maintained in the model. A last check was performed by adding and removing the variables one by one. Firstly, a prediction model for the 5-year LRR risk was developed. Secondly, risks were determined per year conditional on not being diagnosed with recurrence in the previous year(s). Interaction was tested by adding interaction terms to the model. A correlation matrix was composed to assess possible correlation between the variables. Variables with a high correlation coefficient (>0.7 or <−0.7) were excluded. With a ratio of around 100:1, there were enough events for the included variables in the model. Based on simulation studies, it was determined that the ratio should be at least 10:1 [29].

Table 1 Patient and tumour characteristics

The percentage of missing values of the included variables ranged between 0 and 24 % (PR status). ER and PR status were not registered by the NCR on a regular basis in 2003 and 2004. The variables of the prediction model with missing values were multiple imputed using a chained equation approach [3032]. Calculations were performed with the MICE package of R. It was assumed that missing values occurred randomly, which validates the use of imputation. A comparison with the complete case analysis was made, as well as an assessment of the convergence. The analyses were repeated on the imputed data and pooled by using Rubin’s rules.

Validation

Prognostic validity or discrimination refers to the capability to discern between high and low-risk patients [33]. It was measured by the Harrell c-statistic from area under the receiver operating characteristic (ROC). A c-statistic of 1.0 indicates perfect predictive ability, whereas 0.5 represents no predictive discrimination. Calibration, whether the predicted probabilities accord with the observed ones, was evaluated by the Hosmer–Lemeshow goodness-of-fit test in deciles. A P value above 0.05 (indicating no significant difference between the model and the data) is generally considered as a satisfactory goodness-of-fit. Plotting the difference between the observed and predicted probabilities was used for graphical assessment of the calibration.

To see if the model can effectively differentiate between women who will develop a LRR and women who will not, the model was validated. For internal validation, bootstrapping (n = 1000) was used because it provides stable estimates [34]. If the shrinkage factor from the validation is over 0.85, it is considered satisfactory [35]. External validation was performed by regression analyses on the validation cohort. Areas under the ROC curves were compared using the jackknife method proposed by DeLong et al. [36]. A P value < 0.05 was considered statistically significant. Analyses were performed using STATA version 13 and R 3.1.1 software (http://www.r-project.org). The nomogram was developed using HTML and jQuery (JavaScript).

Results

After backward elimination, the model included the variables grade, size, multifocality and nodal involvement of the primary tumour, type of surgery, and whether patients were treated with radio-, chemo- or hormone therapy (Table 2). Assessment of the correlations revealed a high correlation between type of surgery and use of radiotherapy (correlation coefficient -0.8). Since radiotherapy showed a higher influence on the risk, type of surgery was omitted from the model. Due to high correlation between the oestrogen (ER) and progesterone (PR) receptor status, they were combined into one variable (ER/PR negative versus other). Inclusion of interaction terms did not improve the model. The patients in the index and validation cohort had small differences in the included variables age, grade, size, lymph node status, hormone status and treatments (all <3 % per category, Table 1). Healthy convergence was achieved with the multiple imputations.

Table 2 Logistic regression estimates

Validation

Table 3 details the discrimination and calibration properties of the prediction model. The probability measure of the predictive ability given as the c-statistic was 0.71 for the 5-year risk of LRR (95 % confidence interval [CI] 0.69–0.73); indicating good discriminating ability. Per subsequent year after primary treatment, the index group showed an area under the ROC curve of 0.84, 0.76, 0.70, 0.73 and 0.65, respectively. The predictions were well calibrated, as can be seen in the Hosmer–Lemeshow goodness-of-fit test (Fig. 1). For the deciles, the average expected to observed ratio was 1.05 and the P value 0.28, indicating a high agreement between the predictions and observations.

Table 3 Model validation
Fig. 1
figure 1

Calibration chart

Internal validation in the index group with 1000 times bootstrapping revealed a shrinkage factor of 0.98 for the 5-year risk estimates (Table 3). In the external validation, all effects in the validation group were in the same direction, and the estimates in the validation group did not differ significantly from the index group. Tumour size, chemotherapy and hormone therapy had a slightly higher influence in the validation cohort (Table 2). The comparison between the ROC curves from the index and validation group can be found in Fig. 2.

Fig. 2
figure 2

ROC curves of the index (n = 37,230) and validation (n = 12,308) cohort for 5-year LRR risks

The models based on the imputed data were embedded in the nomogram which is available on http://www.utwente.nl/mira/influence. Figure 3 provides a screenshot of the nomogram which shows the time-dependent risk of a theoretical patient aged between 50 and 59, with a T2M0N1, grade II, hormone status negative primary tumour, who did receive hormone therapy, but no radio- or chemotherapy.

Fig. 3
figure 3

Print screen from the nomogram, providing the time-dependent risk of a fictional patient

Discussion

This study describes the development and validation of the first-ever time-dependent logistic regression model for the prediction of the annual risk of LRR of breast cancer, developed based on data from 37,230 patients. The model takes into account the age of the patient, grade, size, multifocality, and nodal involvement of the primary tumour, and whether patients were treated with radio-, chemo- or hormone therapy. The risk factors used in our model are filtered from the population-based registry and are readily available in (Dutch) clinical practice and for use of the nomogram, without extra efforts or data gathering. Validation displayed only a small overestimation of the risk of developing a LRR (as could be expected with large sample sizes [37]).

In a systematic review on primary breast cancer risk prediction models, it was found that calibration of most models was sufficient [38]. However, discriminatory accuracy was considered poor to fair (c-statistic of 0.52–0.66) after internal validation. Reasons provided were lack of knowledge on risk factors, the different subtypes of breast cancer and discrepancies between risk factors across populations [38]. In this study, both calibration and discrimination (c-statistic of 0.71 after validation) were satisfactory. The individual risk estimates do show uncertainty, particularly in the later years. So risk estimates still need to be interpreted with caution. With nodal involvement being the highest risk factor (odds ratio (OR) 2.9 for >3 nodes compared to negative nodes for the 5 year risk, up to OR 8.5 for the risk in the first year), the effects of the included factors are modest. For instance, Thrift et al. [17] advocate that for prediction of individual risks, the relative risk of factors should exceed ten to be a good predictor of individual risk (even though this does not warrant discriminatory accuracy). Subsequently, individual predictions should be improved by decreasing the unexplained variation. Based on the conventional clinical risk factors, this is not to be expected. Hence more research is needed to discover new characteristics with discriminative ability [18].

This study had a number of strengths including data on many variables associated with risk of LRR and a large sample size. Also, the sample size of the validation cohort was appropriately large, as a minimum of 100 events and hundred non-events was proposed by Vergouwe et al. [39] for an external validation population. A correction for possible subsequent recurrences was unfortunately not feasible, while only first and synchronous recurrences are registered in the NCR. Although information on other known risk factors such as vascular invasion and breast density was unavailable and could not be taken into account, the nomogram can be updated to incorporate more variables when they become available in clinical practice and registries [40]. Of note, our analysis showed that Her2-Neu and primary tumour morphology were not independent predictors of LRR. These findings are in contrast to that of previous studies [10, 41]. This could be due to the fact that all Her2-Neu positive patients are treated with herceptin in the Netherlands. Our nomogram was based on data of almost all diagnosed early primary breast cancers between 2003 and 2006; thus, the results should be generalizable to the Dutch population. Another strength is the presentation of the conditional risk through time instead of only a 5-year risk estimate, which enables the clinician to give a better assessment of the risk over time for patients and adjust the follow-up plan accordingly.

The difference in treatment between the index and validation cohort can be attributed to changing guidelines over time. If the risk is of LRR is high, it could be considered to use adjuvant treatment. However, this is outside the scope of this study, the model is targeted at patients who have completed their treatment. The nomogram can be improved with automatic updating: the new patients will cause adjustments of the estimates, and new patients will weigh more than the less recent ones to better tailor the model to the current clinical practice.

User-friendly access through a nomogram is beneficial for both patients and clinicians. Still, it remains important that the users understand the correct interpretation. Therefore, it is of great importance to present the estimates with the corresponding CI [42]. Much used nomograms like for example Adjuvant! Online (adjuvant treatment decisions) [43], the nomograms from Memorial Sloan Kettering Cancer Center (o.a. likelihood that breast cancer has spread to sentinel lymph nodes) [44] or IBTR! (benefit of adjuvant radiotherapy) [45] do not display these intervals, which makes it hard to appreciate the certainty of the risk estimates.

Current guidelines for follow-up after breast cancer aimed at detecting LRRs at an early, asymptomatic stage prescribe equal follow-up for every patient. This research shows there is a great variability in the risk of LRR, underlining the need for an individualised follow-up. With simulation modelling, thresholds can be found for when to assign the visits, so that using the yearly risk predictions, individual follow-schedules can be developed. This will lower the burden on both patients and care providers, as well as health care resources.

Conclusion

This time-dependent logistic regression model for the prediction of the annual risk of LRR of breast cancer nomogram is simple to use and shows a good predictive ability in the Dutch population. It can be used as an instrument to identify patients with a high risk of LRR who might benefit from a less or more intensive follow-up after breast cancer and to aid clinical decision making.