Introduction

Breast cancer remains a major health concern in the United States with an estimate of over 297,000 new cases for 2023 [1]. While survival rates have improved for breast cancer patients with advances in multimodality therapies, surgical resection with negative margins remains the standard of care for most patients. Most early-stage breast cancer patients are candidates for breast-conserving therapy (lumpectomy) or mastectomy for surgical resection given that multiple randomized controlled trials have demonstrated equivalent long-term survival outcomes [2,3,4]. Regardless of surgical strategy, margin status of the resected specimen remains one of the most important factors associated with recurrence after breast cancer surgery [5].

Positive surgical margins are defined as malignant cells identified at the edge of the resection specimen and have been associated with at least twofold increase in ipsilateral breast cancer recurrence [6, 7], higher distant recurrence rates, and shorter survival [8]. Further, patients with positive margins are candidates for re-excision of the concerned margin [6, 9] and these subsequent surgeries are associated with a significant burden to both the patients and healthcare system. While there have been several clinical factors associated with the risk of positive margins including higher stage, higher grade, non-ductal histology, HER2 amplification, and suspicion of multifocality, there is a paucity of data considering both clinical variables and genomic profiles associated with positive margins [10, 11].

In this study, we evaluated the clinical and pathologic factors associated with breast cancer surgical margins using the data for breast cancer (BRCA) from the public resource, The Cancer Genome Atlas (TCGA). In addition to clinical data analysis, exploration of molecular data was also performed in order to identify the genes potentially associated with positive margins.

Materials and methods

TCGA-BRCA data

The TCGA-BRCA patient data including clinical data and sample annotations were downloaded from the Genomic Data Commons (GDC) portal. The RNA-Seq data for the corresponding samples were also downloaded from GDC using the TCGA Biolinks R package [12]. The survival data were obtained from the TCGA Pan-Cancer study and integrated to the clinical data [13]. Since the number of the male patients was small and all of them had negative margin status, we excluded them to avoid the possibility of introducing additional bias. We also removed the redacted samples, and filtered cases using sample annotations. Finally 951 (75 positive and 876 negative margins) cases were retained for this study. The samples were categorized into positive and negative margin groups based on the margin status assigned after first tumor removal surgery.

The 951 sample cohort included primary tumors from patients diagnosed with breast cancer from 1988 to 2013 and had a median follow-up period of 2.2 years. Characteristics of the cohort were examined using frequency distributions and attributes with low numbers were grouped together as “Other.” The diagnosis age of the patients was in the range of 26 to 90 years and was grouped into three categories: old (60 + years), middle age (40–59 years) and young (< 40 years).

Clinical data analysis

Clinicopathologic factors for margin prediction were evaluated using logistic regression models. Subsequently, Fisher’s exact test was performed to test the association of each factor with margins. The impact of each clinicopathologic feature on disease progression was evaluated using univariable and multivariable Cox proportional hazards regression with the recommended endpoint progression-free interval (PFI) [13]. The outcome of interest was time from date of diagnosis to local recurrence or distant metastasis or death from the disease whichever comes first. For margin status, Overall Survival (OS) was also estimated in addition to PFI. Significant factors from the univariable analysis were subjected to multivariable analysis to explore their effect on survival. In order to get a better understanding of the correlation of each significant factor in the multivariable model with survival, a bi-variable survival analysis was also performed. All analysis were carried out in R. All statistical tests were 2 sided, and P values ≤ 0.05 were considered significant.

Molecular data analysis

A matched subset (n = 142) of the current TCGA dataset was selected for molecular analysis; all the cases with a positive margin that had tumor stage reported were included (n = 71) and the negative margin cases (n = 71) were selected by matching primarily on tumor stage and PAM50 subtype. Other features like race, age, and menopausal status were matched as much as possible. Principal component analysis (PCA) was performed to assess distribution of gene expression across PAM50 subtypes [14] and margin status. The RNA-Seq data analysis was performed using the package DESeq2 [15] with adjustment for PAM50 subtype and tumor stage. A 5% False Discovery Rate (FDR) and a fold change of 2 were established as significant criteria. The significant differentially expressed genes (DEGs) from DESeq2 were further subjected to LASSO regression [16] using caret package [17] in order to prevent multicollinearity and to extract the potential gene markers. A 10-fold cross-validation was performed to obtain the minimum lambda which was used in LASSO regression to predict the signature genes. Prediction models using Leave-One-Out Cross-Validation (LOOCV) [18] were performed to validate the gene signature. Additionally, pathway analysis was performed on the gene list from the DESeq2 result using the GSEA Preranked test tool against Hallmark gene set collection [19, 20].

Results

Positive margin is significantly associated with higher tumor stage and lumpectomy

The probability of attaining positive margins after surgery was observed to be significantly (p ≤ 0.05) associated with higher tumor stage, larger tumor size and chest wall involvement (T4), positive lymph nodes (N2, N3), and distant metastasis (M1), based on univariable logistic regression models and Fisher’s exact test (Table 1). The type of first surgery to remove tumor also influenced margin status with lumpectomy (as reference) having significantly higher chance of obtaining positive margins than mastectomy (Simple Mastectomy: p = 0.002, Odds Ratio (OR) = 0.30, Confidence Interval (CI) = 0.13 − 0.62; Modified Radical Mastectomy: p < 0.001, OR = 0.30, CI = 0.15 − 0.57). Among PAM50 subtypes, Luminal A subtype (as reference) was observed to be significantly contributing towards positive margin in the univariable regression model compared to the Basal-like (Basal) subtype (p = 0.05, OR = 0.44, CI = 0.18 − 0.94). Her2-enriched (Her2) subtype was associated with positive margins (OR = 1.39) although it was not significant (p = 0.397). The results of Fisher’s exact test were consistent with the logistic regression results except for PAM50 subtype which did not show any association with margin status.

Table 1 Summary of clinical characteristics of TCGA-BRCA data (n = 951) and their association with margin status

The significant factors in the univariable regression model (Stage, PAM50, TNM: T = Tumor size, N = Lymph Node status, M = Metastasis, Type of first surgery) were used in the multivariable model with margin status as response variable. Tumor stage, size, and lymph node status, which were highly significant in the univariable model, were no longer significant in the multivariable model (Supplementary Table S1). Further evaluation using various multivariable models proved that Stage and TNM were confounding (Supplementary Table S2); hence, only Stage was used in the final multivariable model (Table 2). The final regression model, in agreement with the univariable model, showed that patients diagnosed at higher tumor stage (Stage III: p < 0.001, OR = 4.85, CI = 2.09 − 12.41; Stage IV: p < 0.001, OR = 80.83, CI = 18.65 − 411.45) were significantly associated with positive margins. Similarly, in case of type of surgery for tumor removal, the multivariable regression model reemphasized that lumpectomy (as reference) was significantly associated with positive margin compared to simple mastectomy (p = 0.002, OR = 0.27, CI = 0.12 − 0.59) and modified radical mastectomy (p < 0.001, OR = 0.17, CI = 0.08 − 0.35). For the PAM50 subtypes, Luminal A (as reference) was significantly associated with positive margins compared to basal subtype (p = 0.042, OR = 0.41, CI = 0.16 − 0.91).

Table 2 Multivariable logistic regression analysis on the association of margin status with significant clinical features from univariable analysis

Effect of margin status and other factors on disease progression

Of the 951 cases included in our study, one was excluded from survival analysis due to missing follow-up information. The univariable survival models using 950 cases for margin status showed that positive margins were significantly associated with worse survival with both PFI (p < 0.001) and OS (p = 0.006) as endpoints (Fig. 1). In addition to margin status, stage, TNM, PAM50 subtype, and hormone receptor (Estrogen Receptor (ER), Progesterone Receptor (PR)) status were significantly associated with disease progression (Table 3). While examining the survival models based on histology, mucinous carcinoma was found to have significant survival difference compared to ductal carcinoma (Table 3). However, as the sample size (n =15) and the number of events (n =3) were low for mucinous carcinoma (Supplementary Fig.S1), these results were regarded as unreliable. It is worth noting that the type of first surgery, though significantly associated with margin status, does not significantly impact survival.

Fig. 1
figure 1

The Kaplan–Meier (K–M) curves for cumulative survival in years for margin status for two end points: progression-free interval (PFI) (a) and overall survival (OS) (b). P value, Hazard ratio (HR), and the number of events ‘/’ number of cases are given in the legends of plots

Table 3 Univariable survival analysis to assess the effect of each clinicopathologic factor on disease progression (Progression Free Interval, PFI)

In order to assess the combined effect of margin status and other factors that were significant in the univariable model on survival, multivariable survival analysis was performed. TNM, though significant in the univariable model, was excluded in the multivariable models since tumor stage is derived from TNM and the inclusion of both features in the same model was observed to be confounding in the previous logistic regression model. Surprisingly, PAM50 and ER status were not significant in this model (Supplementary Table S3). Further exploration using different multivariable models (Supplementary Tables S4–S5) indicated that hormone receptor status and PAM50 were confounding to each other; hence only PAM50 was retained in the final model (Table 4). Higher tumor stages (III and IV), and the Basal and Her2 subtypes were significant (p ≤ 0.05) in contribution to disease progression in the final model, while margin status was not significant (p = 0.135, HR = 1.54, CI = 0.88 − 2.70). The bi-variable survival models (Table 5) demonstrated that margin status remained highly significant when PAM50 or either of the hormone receptor (ER/PR) status was added to the model whereas in the model with tumor stage, margin status was only close to significance (p = 0.067).

Table 4 Final Cox proportional hazards regression model for multivariable survival analysis
Table 5 Assessment of impact of each significant factor from univariable models on margin status using bi-variable survival analysis with Progression Free Interval (PFI) as endpoint

Association of gene expression with margin status identified 29 DEGs

To address the sample imbalance between positive and negative margins, a matched dataset (n = 142; Supplementary Table S6) was extracted from our cohort to perform unbiased molecular analyses. Principal component analysis (PCA) of matched samples using 2000 highly varying genes did not clearly cluster the samples by margin status but clustered them instead by PAM50 subtypes (Supplementary Fig. S2). Differential expression analysis between positive and negative margin cases discovered 53 upregulated and 50 downregulated DEGs and the subsequent LASSO regression selected 29 DEGs for the prediction of margin (Supplementary Table S7). The unsupervised clustering for these 29 genes demonstrated largely subtype-driven clusters (Fig. 2). We also observed two main level clusters that have different positive margin enrichment (~ 59% for left cluster, ~ 41% for right cluster, Fisher’s exact p value = 0.044). This show the genes to some degree can separate the positive margin from negative margin. Leave-One-Out Cross-Validation (LOOCV)-based prediction models with the 29 genes showed an accuracy of 0.7.

Fig. 2
figure 2

Unsupervised clustering for 29 significant genes derived using LASSO regression from Deseq2 analysis for TCGA RNA-Seq data

Among the 29 genes, 16 were upregulated and 13 were downregulated in positive margin cases. It included 17 protein-coding genes, 4 pseudogenes (AC084880.1, BEND3P1, CPHL1P, AP002001.2), and 8 long non-coding RNA (LncRNA) genes (AC004947.1, AC008663.2, AC099329.2, LINC01344, SLC26A4-AS1, AF015262.1, AC114296.1, LINC00589).

Pathway analysis identified 8 differentially expressed pathways (Table 6) between positive and negative margin cases. The 7 upregulated pathways include three cell proliferation associated pathways (E2F_TARGETS, G2M_CHECKPOINT, MYC_TARGETS_V1); two cell signaling-related pathways (ESTROGEN_RESPONSE_LATE, ESTROGEN_RESPONSE_EARLY); and two immune-related pathways (INTERFERON_ALPHA_RESPONSE, TNF_SIGNALING_VIA_NFKB). The only downregulated pathway was associated with progression and metastasis (EPITHELIAL_MESENCHYMAL_TRANSITION).

Table 6 Significant pathways observed in TCGA RNA-Seq data (n = 142) using GSEA Preranked test

Discussion

Characterization study to determine effects of factors on margin status

Here, we performed the analysis of both clinicopathologic and molecular factors with the occurrence of positive margins in breast cancer. The incidence rate of positive margins (7.8%; 75/951) in TCGA-BRCA was comparable to other studies [21]. We observed that the risk of positive margins increases with higher tumor stage, larger tumor size, positive lymph nodes, and presence of distant metastasis consistent with prior studies [21,22,23,24]. Conversely, age which has been previously reported to be associated with margin status was not significant in our analysis [22]. This discordance could be attributed to the low number of young patients in the TCGA-BRCA cohort.

At the molecular level, our study demonstrated that immunohistochemistry (IHC) markers ER, PR, and HER2 did not impact margin status, and this is in concordance with the findings described by Horattas et al. [25]. Evaluation of PAM50 intrinsic subtypes in our study, however, demonstrated that Luminal A subtype had a higher risk of positive margins compared with Basal subtype. There is a paucity of literature considering the impact of PAM50 subtypes on margin status, and this novel finding appears to be counter intuitive, given that Basal subtype is associated with higher recurrence rates [26]. The higher incidence of positive margins in the Luminal A subtype may be attributed to morphologic characteristics noted in radiomic studies which attribute spiculated features more commonly to Luminal subtypes and circumscribed features more commonly to Basal subtypes [27].

The survival analysis demonstrated that positive margin status, larger tumor size, positive lymph nodes, distant metastasis, hormone receptor status (ER-negative, PR-negative), higher tumor stage, and two of the PAM50 subtypes (Basal and Her2) significantly contributed to disease progression. This conclusion agrees with previous studies including those using TCGA data, even though margin status was not evaluated in prior TCGA-BRCA data studies [13, 28, 29]. The bi-variable survival models indicated that margin status acted independently from PAM50 or hormone receptor status. It also suggested that margin status might be a surrogate to tumor stage.

Mastectomy does not guarantee negative margins

Per current guidelines, mastectomy is typically indicated for breast cancer patients with larger tumor size relative to breast size, inflammatory breast cancer, multicentric disease, and patient preference as well as in patients with a contraindication to breast-conserving therapy. Patients may prefer a mastectomy over a lumpectomy for a variety of reasons including a decreased risk of positive margins. Our findings are consistent with the literature in regard to the higher risk of positive margins in patient undergoing lumpectomy [6]. Interestingly, Hewitt, et al. reported that in patients with large invasive lobular carcinoma (ILC) tumors, mastectomy fails to obtain clear margins [30]. Out of the 23 patients with positive margins after mastectomy (simple/radical) in our cohort, 15 of them had invasive ductal carcinoma (IDC) and 7 had ILC. It is important to note that, 21 out of these 23 tumors belonged to the higher stage group (Stage III or IV). This again emphasizes higher stage patients has higher chance of undergoing re-excisions irrespective of type of first surgery or histology.

Biomarkers for margin status

To our knowledge, this is the first study to evaluate the impact of gene expression on margin status. The 29 genes obtained after molecular data analysis included many well-known cancer markers. The 12 protein-coding genes upregulated in positive margin group included several well-known tumor markers for breast cancer (EEF1A2, EPHA6, FOXN4, SOX15) [31,32,33,34,35]. In addition, there were genes that were reported in other cancers but were not much explored in breast cancer. The low expression of Alpha-1-Microglobulin/Bikunin Precursor (AMBP) has been reported to increase tumor progression in prostate cancer [36] and oral squamous carcinoma [37], but is relatively understudied in breast cancer. A study exploring expression of this gene across different cancers reported it to be downregulated in breast cancer [38]. However, this gene was found to be overexpressed in our positive margin cohort. Similarly, Iodothyronine Deiodinase 1 (DIO1), a gene involved in the activation and inactivation of thyroid hormone whose low expression is said to promote tumor progression was also upregulated in patients with positive margins [39]. These findings with AMBP and DIO1 genes may merit further studies. In addition, we also identified 6 upregulated genes not typically associated with cancer (ARC, C1orf167, CNGA3, KRT75, SPRR1B, STUM).

There were 5 downregulated protein-coding gene identified in the positive margin group including SPINK1, a well-known tumor marker [40, 41]. Potassium Inwardly Rectifying Channel Subfamily J Member 6 (KCNJ6) was also downregulated in positive margin cases. Potassium channel-driven signaling is known to regulate metastasis in triple negative cancer [42]. Anoctamin 3 (ANO3) was another downregulated genes whose paralogue ANO1 is a known cancer marker for head and neck squamous carcinoma [43]. The Prostaglandin D2 Receptor (PTGDR) also called PGD2 has been associated with different type of cancers [44] even though its role in breast cancer has not been well described. However, it has been reported that high expression of PGD2 resulted in reduced tumor proliferation [45]. This might explain the reason for PGD2 being significantly downregulated in positive margin cases in our cohort. There is limited knowledge of PCP4L1 in malignancies.

The unsupervised clustering of these 29 genes grouped the samples primarily by subtype, although different enrichment of margin positive samples was observed from the two main clusters. A larger sample size of patients with positive margins is needed to validate these genes as predictors of margin status. Since positive margin is a strong indicator for breast cancer recurrence, these genes in turn could be considered as potential markers of recurrence.

Furthermore, the pathway analysis revealed prominent pathways like MYC_TARGETS, E2F_TARGETS that have been reported by previous studies to be associated with breast cancer recurrence (Table 6) [46]. Estrogen response-related pathways were upregulated in margin positive samples further emphasizing our previous observation of Luminal A having higher chance of positive margin compared to basal subtype (Table 2).

Conclusion

The clinical data analysis results using TCGA-BRCA data show that higher stage, larger tumor size, positive lymph nodes, presence of distant metastasis, and Luminal A subtypes have higher chance of obtaining positive margins after first surgery. We also observed that mastectomy for tumor removal reduced chance of positive margins compared to lumpectomy. This is in agreement with the previously reported studies. However, we also found that margin status likely was a surrogate to tumor stage and, hence, patients diagnosed at higher stage, regardless of type of surgery had higher chance of obtaining positive margins. Additionally, we also observed that patients belonging to Luminal A intrinsic subtype had higher chance of obtaining positive margins compared to Basal subtype. Based on these findings, patients with Luminal A or higher stage tumors should be counseled on their increased risk of positive margins. Clinical indications for wider margin resection for these patients would require further rigorous examination in a clinical trial prior to definitely altering guidelines. We also identified 29 genes and 8 pathways significantly differential expressed between positive and negative margins. These 29 genes, some of which had not been reported to be associated with breast cancer previously, could serve as potential predictors of margin status. However, additional studies need to be performed on a larger sample size to validate these findings. On-going studies to further identify risk factors associated with positive margins will help physicians in determining treatment strategy and counseling their patients.