Background

Breast cancer is the most frequently diagnosed cancer and the leading cause of cancer death in women worldwide. It is a heterogeneous disease, and different subtypes of breast cancer show distinct clinicopathologic features, aggressiveness, response to therapies, as well as survival outcomes. Triple negative breast cancer (TNBC), the most aggressive subtype of breast cancer characterized by lack of expression of estrogen receptor (ER), progesterone receptor (PR) and human epidermal growth factor receptor 2 (HER2), has significantly poorer outcomes than non-TNBC subtypes due to the natural history of this life-threating disease and lack of endocrine and target therapies [1, 2]. Indeed, TNBC is also a heterogeneous disease including several distinct molecular subtypes that differ in biological features, treatment response and prognosis [3]. Nomogram is an useful and convenient tool for cancer patients to quantify and predict risk and prognosis. For breast cancer patients, a lot of prognostication nomograms have been developed and validated based on traditional clinicopathological features [4,5,6,7,8,9,10,11,12,13,14]. However, the prognostic values of these models were only tested in a few cohorts of TNBC subgroup [6, 13, 15]. Furthermore, the majority of these models were developed based on white patients not Asian women, and only one nomogram [13] can predict recurrence risk (most models focused on overall survival or breast cancer-specific survival). Thus, nomograms for predicting recurrence risk and survival outcomes in TNBC are scarce. Stromal tumor-infiltrating lymphocytes (TILs) are recently reported to show important prognostic value in TNBC, both in the adjuvant and neoadjuvant settings [16,17,18,19]. Yet, TILs have not been included in any breast cancer prognostic models so far.

In the current study, we aimed to develop nomograms to predict the disease-free and overall survival for non-metastatic TNBC patients using clinicopathological and molecular variables as well as stromal TILs from 296 TNBC patients treated at Sun Yat-sen Memorial Hospital in China. Moreover, we externally validated the prognostic models using independent cohorts of 191 Chinese women from the second Xiangya Hospital and Peking University Shenzhen Hospital.

Methods

Patient population and data processing

The training set was based on data from 296 patients with invasive TNBC who meet the inclusion criteria diagnosed and treated at Sun Yat-sen Memorial Hospital from 2002 to 2014. inclusion criteria defined eligible women who were age 18 years or older and had diagnosed non-metastatic invasive breast cancer, had confirmed histology as defined by the American Joint Committee on Cancer (AJCC) (thresholds for defining ER/PR negative were set at less than 1% using immunohistochemical staining), had complete follow-up, availability of tumor samples, no history of previous malignancies (except for primary skin basal cell carcinoma and squamous cell carcinoma). Totally, there were 435 non-metastatic invasive TNBC patients who were age 18 years or older treated at Sun Yat-sen Memorial Hospital from 2002 to 2014. We excluded 52 (12.0%) patients who had incomplete follow-up information, and then removed 84 patients whose tumor samples were not available, as well as 3 women with a history of previous malignancies. For patients who underwent neoadjuvant chemotherapy and had clinically negative axillary lymph nodes, sentinel lymph node biopsy was performed before neoadjuvant therapy. And all of the pathologically node-positive patients received axillary lymph node dissection. An external validation cohort of 191 TNBC women who met the same inclusion criteria was enrolled from the second Xiangya Hospital (n = 144) and Peking University Shenzhen Hospital (n = 47) between 2007 and 2012. All patients were required to have sufficient information to score all variables in the developed nomograms. Ethical approval was obtained from participating institutions through their respective institutional review boards Ethical approval was obtained from participating institutions through their respective institutional review boards (IRB) (Sun Yat-sen Memorial Hospital Ethics Committee and IRB, Ethics Committee and IRB of Peking University Shenzhen Hospital, and Ethics Committee and IRB of the Second Xiangya Hospital, Central South University), and written informed consent was obtained from study participants.

We retrieve all relevant information on demographic data (age, marital status, family history of breast cancer), clinicopathological features (menstrual status, histological type, grade, tumor size, node status, Ki67 index), and treatment information (surgery type, receiving of radiotherapy, chemotherapy type, chemotherapy regimen) for all of the included patients. In the dataset, some variables (grade, and Ki67 index) contained missing data, which may result in biases. To compensate for this, multiple imputation methods by chained equations [20,21,22] to account for the missing values of variables was performed before nomogram development and validation. We have created ten multiple imputed-datasets, and variables included in the imputation model were age, marital status, family history of breast cancer, menopausal status, tumor size, node status, stage and sTIL group. The raw stromal TILs and Ki67 values were estimated using a CLIA certified lab. Stromal TILs were evaluated in hematoxylin and eosin (H&E) sections originally sampled from each TNBC included in this study, following the criteria proposed by the International TIL WG [23]. Concisely, all mononuclear cells in the stromal compartment within the borders of the invasive tumor were evaluated and assessed as a percentage value. The scoring report did not include TILs which were outside of the tumor borderline, or around DCIS and normal tissue, or in the necrosis areas. One experienced pathologist has evaluated stromal TILs in all the cases. Ninety-eight randomly selected cases, corresponding to approximate 20% of the study population, were separately annotated by a second pathologist for assessing the inter observer consistency of the readings. The end points were disease-free survival (DFS defined as time from date of diagnosis to the local, regional recurrence, distant metastasis, contralateral breast cancer, death (including non-cancer death) or last contact (June 30th, 2017)) and overall survival (OS, calculated from date of diagnosis to the date of death or last contact (June 30th, 2017)).

Statistical analysis

Survival curves for distinct variables were generated using the Kaplan-Meier estimates and were compared using log-rank test. Prognostic factors that assessed by univariable Cox analysis were subjected to backward stepwise (which used the Akaike information criterion) Cox proportional regression analysis to identify statistically significant variables (P < 0.05) to be included in the final nomograms. Interaction between variables was assessed by adding interaction variable to the Cox model. We tested the interactions between ki67 and node status, ki67 and tumor size, as well as tumor size and node status. For the calibration (modified Hosmer- Lemeshow statistic for survival analysis), the nomograms were then subjected bootstrap method [24] of leave one out prediction 1000 times for internal validation of training cohort and external validation of validation cohort by R statistical software, rms package. The bootstrap resampling for internal validation was performed to reduce the over-fitting bias of the model and obtain the evaluation value of more reliable prediction accuracy of the model. External validation with independent cohorts of 191 women from the second Xiangya Hospital and Peking University Shenzhen Hospital was also performed. The predictive accuracy and discriminative ability of nomograms were determined by concordance index (C-index) (C index is actually a generalization of the area under the ROC curve [25]), area under the curve (AUC) and calibration curves. We compared the predictive accuracy and discriminative ability of our nomograms with the seventh and eighth AJCC staging system, and the classical PREDICT [6] and CancerMath models [9]. Comparison between two different models was according to previously described methods [26]. We compared the predicted survival with observed actual survival to calibrate the nomograms for 3-, and 5-year DFS and OS. Furthermore, we determined the cutoff values of the predicted scores for differentiating patients to low-risk, intermediate-risk, and high-risk groups using the X-tile software program (Yale University, New Haven, CT, USA [27]) based on the maximal chi-square test by grouping all the patients into distinct risk groups after sorting by total score. To avoid the problem of multiple cut-point selection, X-tile can produce corrected P values using several Monte Carlo simulations. And the respective Kaplan-Meier curves were then delineated. Statistical analyses and modeling were performed using STATA (version 13; Stata Co., College Station, TX), and R software packages. All statistical tests were two-sided, and statistical significance was defined as P < 0.05.

Results

Study population characteristics

The training population included 296 invasive non-metastastic TNBC women treated at Sun Yat-sen Memorial Hospital with a median follow-up of 52.5 months. There were 78 DFS events, and 46 deaths during the follow-up period for training cohort. Independent validation cohorts were compromised of 191 women diagnosed with operable invasive TNBC in the second Xiangya Hospital (n = 144) and Peking University Shenzhen Hospital (n = 47) over a median follow-up of 68 months. A total of 51 DFS events and 32 deaths occurred in the validation population. Some collected variables (grade, and Ki67 index) contained missing data (less than 20%), so multiple imputation was performed before nomogram development and validation to account for the missing values of these variables. Demographic and clinicopathological characteristics of patients in the training and validation cohorts before and after multiple imputations are shown in Table 1.

Table 1 Demographic and clinicopathological characteristics of patients in the training and validation cohorts before and after multiple imputation

Prognostic nomogram for DFS

In the training set, DFS curves for different demographic, clinicopathological and treatment factor values were generated by the Kaplan-Meier estimates and were compared by log-rank test. The variables that selected in the final multivariable Cox regression model were stromal TILs, tumor size, node status, and Ki67 index (Table 2). A nomogram that incorporated these four prognostic variables was then developed (Fig. 1a), and we named this nomogram as triple-negative recurrence (TNR). Each subtype within these variables was assigned a score on the point scale (Additional file 1: Table S1.). Briefly, we can put the specific values of a TNBC patient into the TNR nomogram, and then calculated a score for this patient. According to the score, we may predict the 3 year- and 5 year-DFS for this individual. Ideal concordance in AUC was observed for the nomogram in both training and validation cohort with C-index of 0.743 and 0.784, respectively, and AUC of 0.777and 0.783 (Fig. 2), respectively.

Table 2 Univariable and multivariable analysis of training set for DFS
Fig. 1
figure 1

Prognostic nomograms for predicting (a) DFS and (b) OS of patients with non-metastatic TNBC (When using these nomograms, individual patient’s value will be located on each variable axis, and a line will be drawn to determine the scores received for each variable value. Sum of the scores will then be located on the Total Points axis. According to the scores, we may predict the 3 year-, 5 year-, and 10 year-DFS or OS for this individual)

Fig. 2
figure 2

Discriminatory accuracy for predicting DFS assessed by receiver operator characteristics analysis calculating AUC. 5-year DFS in the a) training cohort and b) validation cohort. TNR = triple-negative recurrence; AUC = area under the curve

Prognostic nomogram for OS

In the training set, OS curves for different demographic, clinicopathological and treatment variable values (the same variables in the DFS nomogram initiating) were generated. The variables in the final multivariable Cox regression model were stromal TILs, tumor size, node status, and Ki67 index (Table 3). A nomogram that incorporated these four prognostic variables was then developed (Fig. 1b), and the model was named as triple-negative survival (TNS). Each subtype within these variables was assigned a score on the point scale (Additional file 1: Table S1.). Briefly, when using the nomogram, we can put the specific values of a TNBC patient into the TNS nomogram, and then calculated a score for this patient. According to the score, we may predict the 3 year- and 5 year-OS for this individual. Ideal concordance in AUC was observed for the nomogram in both training and validation cohort with C-index of 0.791 and 0.783, respectively, and AUC of 0.813 and 0.784 (Fig. 3), respectively.

Table 3 Univariable and multivariable analysis of training set for OS
Fig. 3
figure 3

Discriminatory accuracy for predicting OS assessed by receiver operator characteristics analysis calculating AUC. 5-year OS in the a) training cohort and b) validation cohort. TNS = triple-negative survival; AUC = area under the curve

Calibration of nomograms and comparison with AJCC staging、PREDICT and CancerMath

An acceptable agreement of the calibration plots was found both in the training and validation cohorts between the model prediction and actual data for 3-, and 5-year DFS and OS (Additional file 2: Figure S1). In the training cohort, the C-index for our model to predict DFS (0.743, 95% CI 0.692–0.794) was significantly better than that of the seventh and eighth AJCC TNM staging system (0.666, 95% CI 0.611–0.721, P = 0.003; 0.664, 95% CI 0.605–0.723, P = 0.024); and the C-index to predict OS (0.791, 95% CI 0.735–0.847) was statistically greater than that of the TNM systems (0.683,95% CI 0.613–0.753, P = 0.004; 0.677, 95% CI 0.606–0.748, P < 0.001) as well. Similarly, in the validation cohort, the C-index of our model to predict DFS (0.784, 95% CI 0.724–0.844) was much higher than that of the seventh and eighth TNM systems (0.632, 95% CI 0.518–0.746, P = 0.02; 0.607, 95% CI 0.554–0.660, P = 0.002); and the C-index to predict OS was also better for our nomogram prediction (0.783, 95% CI 0.705–0.861) than for the TNM systems prediction (0.656, 95% CI 0.516–0.796, P = 0.006; 0.606, 95% CI 0.535–0.677, P = 0.001). Furthermore, we compared the predictive accuracy and discriminative ability of our nomograms with two classical breast cancer models. The AUC for OS was0.813 in the training and 0.784 in the validation cohort, respectively, which was larger than the AUCs of 0.752 and 0.767 in training and 0.766 and 0.751 in validation for PREDICT and CancerMath, respectively (Fig. 3).

Performance of the nomogram in stratifying risk of patients

We then defined the cutoff values using X-tile software program by grouping patients in the training cohort into three groups after sorting by total DFS or OS score (Additional file 1: Table S1.). Each group showed significantly different survival outcomes (Additional file 1: Table S1., Fig. 4a and b). These cutoff values also well differentiated patients in the validation cohort to low-risk, intermediate-risk, and high-risk groups with extremely distinct prognosis (Fig. 4c and d).

Fig. 4
figure 4

Risk group stratification in the training and validation cohort. DFS curves of patients in the a) training cohort and c) validation cohort by nomogram (TNR) score groups; OS curves of patients in the b) training cohort and d) validation cohort by nomogram (TNS) score groups. TNR = triple-negative recurrence; TNS = triple-negative survival

Discussion

To the best of our knowledge, this is the first study that incorporates stromal TILs into clinicopathological variables in predicting prognosis for TNBC patients. Our nomograms, named as TNR and TNS, which were developed using the Chinese TNBC patients treated at Sun Yat-sen Memorial Hospital, showed AUCs of 0.777 for DFS and 0.813 for OS in the training cohort. The discriminatory accuracy of TNR/TNS was then validated in independent external validation patient population from the second Xiangya Hospital and Peking University Shenzhen Hospital by AUCs of 0.783 for DFS and 0.784 for OS. In addition, our nomograms showed significantly higher C-index than that of the seventh and eighth AJCC TNM staging system in predicting DFS and OS; and larger AUCs compared with the classical prognostic models including PREDICT and CancerMath, although the improvement was little.

When assessing the outcomes and risk of breast cancer, predictive nomograms are useful tools. Lots of such models have been developed based on clinicopathological and receptors statuses [4,5,6,7,8,9,10,11,12,13,14]. Nevertheless, majority of these models were developed based on white patients from American and European countries, and many nomograms focused on OS or breast cancer specific survival (BCSS), but not DFS. One nomogram [13] can predict recurrence risk, but it may not be generalizable to external populations because it was developed using patients from a famous large single institution (MD Anderson Cancer Center) that may bring potential referral and therapeutic bias. As we known, the clinicopathological features and prognosis of breast cancer may vary by race/ethnicity. For instance, the average age of onset for Asian women was approximately 10 years younger than that for western women [28,29,30,31,32,33]. Therefore, the majority of these models that were developed based on western patients may have limited value in Asian breast cancer patients. There is a nomogram developed from Taiwanese women, however, it can only predict OS for patients treated with mastectomy [12]. Furthermore, the prognostic values of these existing models were only tested in a few cohorts of TNBC [6, 13, 15], which is a heterogeneous disease comprised of several distinct subtypes with totally different prognosis. A potential predictive model for TNBC based on simple sum of ≥4 positive lymph nodes, positive Cathepsin-D expression and Ki-67 index ≥20% has been reported previously [34]. However, the score for each variable was not well justified, and the model included patients only from a single institution. Moreover, it showed smaller AUCs for predicting survival both in the training (0.696) and validation set (0.717) compared with our model.

Further, compared with existing nomograms, TNR/TNS incorporated several new and potentially universal predictive or prognostic factors for TNBC including stromal TILs and Ki67 index. The prognostic significance of TILs in TNBC has been recently demonstrated in a number of randomized clinical trials, both in the neoadjuvant and adjuvant settings [16,17,18,19]. The International TIL Working Group released detailed guidelines in 2014 for harmonizing TILs assessment in routine samples [23]. In this study, we assessed the stromal TILs strictly by applying the recent International TILWG guidelines. For the first time, we developed and validated nomograms predicting outcomes in TNBC by incorporating stromal TILs into the models. Additionally, previous studies have demonstrated that TNBC with higher Ki-67 index is associated with larger tumor size, more positive nodes, and worse prognosis [35, 36]. Our findings suggested that Ki-67 index ≥40% may be adequate to demonstrate an association with recurrence and unfavorable survival in TNBC.

Despite above strengths, our nomograms are limited by the retrospective nature of data collection and relatively small sample size. Some of the calibration plots for the validation cohort were less than ideal, which is another limitation of this study. Also, the TNR and TNS were based on Chinese TNBC patients, therefore it is not clear whether they can be applied to western patient cohorts or not. Further efforts on prospective data collection, larger patient cohorts, and validation in other geographic patient populations are needed to improve our nomograms.

Conclusions

We have developed and validated novel, well-calibrated nomograms for predicting DFS and OS in non-metastatic Chinese TNBC patients by including stromal TILs for the first time. These prognostic nomograms can help clinicians in risk consulting/management and selection of long term survivors among TNBC patients. Additional studies are required to identify whether they can be applied to other geographic patient populations.