Background

Breast cancer is the most common kind of malignancy in females worldwide; it ranks second in contributing to tumor related death in women [1, 2]. Approximately 266,120 new cases of invasive breast cancer and 40,920 breast cancer deaths were expected to occur among US women in 2018 [1]. 5–10% of patients were diagnosed with metastatic breast cancer (MBC) at the initial diagnosis. Accurately estimating the prognosis of these patients helps greatly in clinical decision-making. However, most prognosis models were developed for early-stage breast cancer [3, 4]. Thus, effective prediction models for de novo MBC patients are warranted to be developed.

Breast cancer tends to be heterogeneous, characterized by diverse histopathologic and molecular features, including age at diagnosis, race, differentiation grade, molecular subtypes, and site of metastasis. These characteristics were previously reported to be associated with survival of de novo MBC patients [5, 6]. Chemotherapy and radiation therapy remain the mainstay for MBC patients. Primary tumor resection is not routinely recommended because MBC is considered an incurable disease [7, 8]; it is only considered as a means of palliation. However, many retrospective analyses reported the survival benefit of primary tumor resection [9,10,11,12]. These factors mentioned above may interact, leading to distinct outcomes across individual patients.

A nomogram is a reliable and accurate visualization model utilizing risk factors identified in multivariate analysis; it is widely used for the prediction of survival in oncology [13, 14]. In this study, we developed and validated a nomogram to predict the survival of de novo MBC patients, through a large cohort of well-characterized patients identified from the Surveillance, Epidemiology and End Results (SEER) database.

Methods

Patients

Data was obtained from the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) program, which consists of 18 population-based cancer registries, for patients diagnosed between 2010 and 2016. SEER is an open-access resource for tumor-based demographic and pathological information, as well as treatment information and patient survival outcomes. SEER*Stat Version 8.3.4 (http://www.seer.cancer.gov/seerstat) was used to identify eligible patients.

Because the SEER database began collecting information on the human epidermal growth factor receptor-2 (HER2) status and sites of distant metastasis in 2010, this was used as the starting point. The inclusion criteria of MBC patients were listed as follows: female, year of diagnosis from 2010 to 2016, older than 18 years old when diagnosed, breast cancer as the first and only malignant tumor diagnosis, histology of infiltrating duct or/and lobular carcinoma(IDC, ILC), at least one distant site of de novo metastasis. Patients with unknown condition of marital status, race, differentiation grade, T stage, N stage, site of metastasis, or follow-up information were excluded.

Demographic variables including age at diagnosis (<=40, 40–60, and > 60 years), race (white, black, and others, including American Indian/Alaska Native, Asian or Pacific Islander) and marital status (married and unmarried, including divorced, separated, widowed, single (never married) or domestic partner). Tumor characteristics included histology (IDC, ILC, and IDC and ILC), grade (grade I, grade II and grade III/IV), molecular subtype, T stage (T1, T2, T3, and T4), N stage (N0, N1, N2 and N3), and metastatic site (the bone, brain, liver, and lung). Therapies included chemotherapy (No/Unknown and Yes), radiation (No/Unknown and Yes), and surgery of the primary tumor (No, Mastectomy, and Breast conservation surgery (BCS)). Estrogen receptor (ER) and progesterone receptor (PR) status were combined as the hormone receptor (HR) status, and the breast cancer molecular subtype was stratified based on joint HR and human epidermal growth factor receptor-2 (HER2) statuses (HER2−/HR+, HER2+/HR-, HER2+/HR+, and HER2−/HR-).

Statistical analysis

Patient demographics, tumor characteristics and treatment information were compared using the chi-square test. Overall survival (OS) was defined as the time from breast cancer diagnosis to death from any cause. Patients in the initial cohort were allocated randomly into a training cohort and a validation cohort with a ratio of 2:1. The training cohort was used to develop a nomogram while the validation model was used to validate the model. In the training cohort, the covariates included in the multivariate Cox proportional hazards models were identified by a backward stepwise method based on the smallest Akaike information criterion (AIC) value, which indicated the minimal loss of prognostic information [15, 16].

The nomogram was developed on the basis of independent risk factors and using the “rms” R package. The predictive capacity of the nomogram was assessed using Harrell’s C-index (the concordance statistic, or C-statistic) and the area under the time-dependent receiver operating characteristic curve (AUC), which estimates the probability between the observed and predicted OS. Bootstrapping method with 1000 resamples was utilized to generate the calibration curves for validation of the nomogram in the training cohort and in the validation cohort. The scores of each variable were calculated using the “nomogramEx” package in R. On the basis of the scores of each variable, the total scores for each patient could be calculated.

All analyses were performed with SPSS (version 24.0; SPSS, Inc., Chicago, IL) and R version 3.6.0 (http://www.r-project.org). Statistical significance was assumed at a two-side p value of < 0.05.

Results

Patient characteristics

We included 7986 patients with de novo MBC in the final analysis. The flowchart of the patient selection process is shown in Fig. 1. The median follow-up time was 36 months (range: 0–83 months). The median age at diagnosis was 59 years. Most of the patients (75.1%, 5999) were white. 50.3% (4018) of tumors were poorly differentiated or undifferentiated. HR+/HER2- was the most common (57.6%) subtype among MBC patients, followed by HR+/HER2+ (19.0%) and TNBC (triple negative breast cancer) (13.6%) while HR−/HER2+ was the least common (9.8%) subtype. The most common site of metastasis was bone, making up 73.4% (5865) while the least common site of metastasis was brain, making up 6.9% (548). 27.5% (2196) of patients received surgery for the primary tumor, of which 37.9% (832) underwent BCS. 35.0% (2797) of patients received radiotherapy and 61.8% (4937) of patients received chemotherapy. The 1-, 3-, and 5-year OS rates were 74.5, 45.3, and 28.2%, respectively.

Fig. 1
figure 1

The flowchart of patient selection process

The included 7986 patients were allocated randomly into the training cohort (N = 5324) and the validation cohort (N = 2662). The demographic, pathological and treatment information of the two cohorts is listed in Table 1. The distribution of these factors was balanced in the training and validation cohorts. The median OS of the training and validation cohorts was 38 months (interquartile range, 13–66 months) and 39 months (interquartile range, 12–68 months), respectively.

Table 1 The demographic, pathological and treatment information of MBC patients diagnosed at 2010–2016 in the SEER database

Nomogram construction

According to univariate analysis, age at diagnosis, race, marital status, differentiation grade, molecular subtype, T stage, bone metastasis, brain metastasis, liver metastasis, lung metastasis, surgery, radiotherapy and chemotherapy were associated with OS (p < 0.05, Table 2). The smallest AIC value occurred when we incorporated 12 factors into the multivariate Cox analysis: age at diagnosis, race, marital status, differentiation grade, molecular subtype, T stage, bone metastasis, brain metastasis, liver metastasis, lung metastasis, surgery and chemotherapy (AIC = 6606.9). Figure 2 shows the prediction of the 1-, 3- and 5-year OS probability in the nomogram. Every specific value of these factors was allocated a score on the points scale. By adding up these scores, the total score was calculated. The total points was used to estimate the 1-, 3- and 5-year survival probability for every individual patient.

Table 2 The prognostic factors identified in the univariate and multivariate Cox regression models in the training cohort incorporating covariates identified by the smallest AIC value
Fig. 2
figure 2

Nomogram predicted 1-, 3- and 5-year overall survival for de novo MBC patients

Nomogram validation and calibration

The nomogram was validated in the training cohort and in the validation cohort, respectively. The C-index was 0.723 (95% CI, 0.713–0.733) and 0.719 (95% CI, 0.705–0.734) in the training and validation cohort, respectively. In the training cohort, the AUC values of the nomogram to predict 1-, 3- and 5-year OS was 0.784 (95% CI, 0.752–0.816), 0.777 (95% CI, 0.757–0.798) and 0.786 (95% CI, 0.768–0.803), respectively. In the validation cohort, the AUC values of the nomogram to predict 1-, 3- and 5-year OS was 0.802 (95% CI, 0.762–0.841), 0.784 (95% CI, 0.757–0.811) and 0.790 (95% CI, 0.765–0.814), respectively (Fig. 3). The calibration plots for the probability of OS indicated an optimal agreement between 1-, 3- and 5-year prediction by nomogram and observation in both training cohort and validation set (Fig. 4).

Fig. 3
figure 3

1-, 3 -, and 5-years receiver operating characteristic curves in training a and validation cohorts b

Fig. 4
figure 4

The calibration plots for predicting patient survival at 1-, 3- and 5-year point in the training cohort a, b, c) and the validation cohort (d, e, f)

Discussion

The survival of patients with de novo MBC is difficult to predict, because of the lack of prediction models for these patients. In this study, we developed a nomogram to visualize survival of de novo MBC patients identified from the SEER database. This model was validated, and the performance was evaluated. Calibration plots showed an optimal agreement between the observed risks and the estimated risks by the nomogram, indicating the reliability of this model. Discrimination was evaluated by the C-index and AUC values. Both of them indicated good specificity and sensitivity in this nomogram.

Several nomograms have been developed to predict the survival of MBC patients [17,18,19,20]. C. K. Lee et al. focused on predicting survival of MBC patients with relapsed disease [17]; Giovanni Corso et al. used a nomogram to predict the risk of developing relapsed disease [20]; studies by S. R. Li et al. and Z. C. Xiong et al. combined the de novo MBC patients and those with relapsed disease [18, 19]. However, many studies have shown that women with de novo MBC represent a group that is distinct from that of women with relapsed breast cancer [21,22,23] . Patients with de novo MBC usually have better survival than those developed from regional diseases. One hypothesis explaining the better outcome of de novo MBC than recurrent MBC is the use of adjuvant systemic therapy in patients with relapsed disease. Due to the selection of more resistant or aggressive clones during adjuvant therapy, the metastatic disease of recurrent MBC becomes more resistant to therapy. Thus, recurrent MBC patients should not be mixed together with de novo MBC patients. In our study, we only included de novo MBC patients; to our knowledge, it was the first nomogram to predict survival of patients with de novo MBC.

MBC is a kind of heterogeneous diseases. Many factors affect the prognosis and therapeutic efficacy of drugs. The molecular subtype is a vital prognostic factor and serves as the cornerstone of treatment [6, 24, 25]. According to the expression condition of ER, PR and HER2, breast cancer can be divided into four subtypes—HR+/HER2-, HR−/HER2+, HR+/HER2+ and TNBC characterized by the absence of ER, PR and HER2. In our analysis, HR+/HER2- was the most common (57.6%) subtype among MBC patients, followed by HR+/HER2+ (19.0%) and TNBC (13.6%) while HR−/HER2+ was the least common (9.8%) subtype. In the nomogram, molecular subtype played a major role in the scoring system. TNBC subtype yielded the highest score, consistent with previous reports [5, 24, 26]. The site of distant metastasis was reported to be correlated with the survival of MBC patients. Patients with bone metastasis showed the best prognosis and those with brain metastasis showed the worst prognosis [5, 27,28,29]. The score distribution of metastasis in the nomogram showed consistent results. It also has been reported that among MBC patients, molecular subtype correlated tightly with the preferred metastatic site [5, 27, 30]. Even in patients metastatic to the same site, molecular subtype showed a significant prognostic role. Age at diagnosis, marital status and differentiation grade also had an impact on survival. In our analysis, we combined all these prognostic factors to construct the nomogram, in order to predict the survival of a specific patient with de novo MBC accurately and identify patients with favorable prognosis. Those at a low risk of mortality should be given aggressive multidisciplinary therapy.

MBC is considered incurable. Systemic therapy remains the mainstay of therapy [7]. Over the past 2 decades, survival of MBC patients has improved dramatically due to the development of target therapy and palliative care [31,32,33]. In our analysis, the 1-, 3-, and 5-year OS rates were 74.5, 45.3, and 28.2%, respectively. However, the prognostic role of primary tumor resection has not been determined. In this study, we found MBC patients benefited from surgery of the primary tumor. This finding was in agreement with conclusions reported in other retrospective studies [9, 34, 35]. Due to the selection bias existing in retrospective studies, the protective role of surgery couldn’t be directly concluded. Prospective randomized clinical trials have investigated the role of primary tumor resection in MBC patients, and resulted in contradictory conclusions [36, 37]. These results indicated that primary tumor resection did improve the survival of a subset of patients, but we have to determine who should receive primary tumor resection and when to administer the surgery.

There existed some limitations in this study. Firstly, it was a retrospective study and it was subject to all the inherent biases associated with this type of study design. Furthermore, some prognostic factors were not included in the SEER database, including the number of metastatic lesions, use of endocrine therapy and use of target therapy. Thirdly, the nomogram in our study was validated in the same population and such validation on model performance could be biased. Therefore, the predictive effect of the nomogram needs to be assessed carefully in other cohorts.

Conclusion

The developed nomogram reliably predicted OS in patients with de novo MBC and presented a favorable discrimination ability. Using this model, the role of primary tumor surgery and other significant prognostic factors in MBC patients could be estimated. This will guide surgical decision making in clinical practice, although the findings require additional validation.