Introduction

Screening mammography is responsible for most diagnoses of asymptomatic ductal carcinoma in situ (DCIS) [13], raising concern for overtreatment of this nonlethal disease. In contrast to invasive breast carcinoma (IBC), radiation therapy (RT) has not demonstrated a survival benefit for DCIS [4], yet clinical trial subset analyses have failed to identify a patient subgroup that derives no recurrence-free survival (RFS) benefit; similarly, we cannot identify which DCIS patients benefit from adjuvant endocrine therapy [57]. Understanding how DCIS evolves to IBC, in terms of genomic progression and temporal progression, may provide insight into addressing these screening issues.

We and others have previously performed genome-wide sequencing studies on progression of breast neoplasia, from hyperplasia to carcinoma in situ to invasive carcinoma. These studies indicate that there is a gradual somatic gain of copy number alterations (CNAs) and single nucleotide variations (SNVs) [812]. Our studies have examined hyperplasia, DCIS and IBC from cross-sectional samples, by both targeted sequencing [12] and whole genome sequencing [10], to identify genomic changes that occur in progression from these pathologically defined neoplasias. These data have identified specific genomic changes to pathologic lesions defined by morphology whose risks have previously been studied at an epidemiologic level [13], including common CNAs and SNVs that have been identified in IBC [14, 15]. These gradual genomic changes provide an opportunity to predict which DCIS lesions are likely to be associated with progression to IBC.

It is well recognized that risk stratifying DCIS is challenging because of its clinical and biological heterogeneity. An additional problem when considering genetic relationships (lineage analysis) and generating genetic biomarkers of risk, is that the standard of surgical care for DCIS is that the entire lesion is removed. Thus, studies that examine the recurrence of DCIS or emergence of IBC are not likely to directly address the genetic relationships between DCIS and IBC that are essential to our understanding as to how cancer develops genetically. A cross-sectional study (examining concurrent DCIS and IBC) addresses this issue directly. The natural genetic relationships of concurrent DCIS and IBC are preserved and have not been altered by treatments. These cross-sectional samples provide a good way to test potential genetic biomarkers, such as somatic SNVs and CNAs, on a large cohort.

A number of studies have previously examined the risk of DCIS recurrence using protein expression markers [16]; however, DNA copy number changes are common in early genomic lesions and may serve as more robust biomarkers due to their insensitivity to intratumoral factors such as hypoxia. In this study, we examined the accumulation of CNAs as a biomarker for developing IBC in noninvasive neoplasia. We generated a theoretical analysis of SNV and CNA frequencies in DCIS through a simulation experiment based on IBC data from The Cancer Genome Atlas (TCGA) [14]. Since genomic change appears to correlate with progression [9], we aimed to study these changes in a large cohort at the level of the preinvasive DCIS lesion, and to characterize its association with clinical and demographic data [17, 18]. These findings may enable the development of molecular tools for DCIS risk stratification, which is an urgent clinical need.

Methods

Data resource environment and patient identification

All available cases with enough tissue for sampling were identified in the Department of Pathology at Stanford University Hospital (SUH) from 2000 to 2007 with the diagnosis of either DCIS and no development of IBC over a median follow-up of 9 years or DCIS with concurrent IBC present, based on per protocol assessment by SUH pathologists. Surgical samples with sufficient tissue were collected with Health Insurance Portability and Accountability Act (HIPAA)-compliant Stanford University Institutional Review Board (IRB) approval (Protocol number 19482 and 22825). Because archival tissue was used, a waiver of consent was obtained. All research was approved by SUH and the State of California IRB (for use of state cancer registry data).

Clinical data extraction and data addition

Using Oncoshare, a multisource data resource for breast cancer outcomes research, we extracted clinical data from SUH electronic medical records (EMRs) (Epic Systems, Verona, WI, USA) and from a SUH warehouse for clinical data collected before Epic implementation in 2007, the Stanford Translational Research Integrated Database Environment (STRIDE), as previously published [17, 18]. We requested state cancer registry (California Cancer Registry, CCR) records for all patients with breast cancer treated at SUH from 2000 through 2011. CCR and EMR records were linked using names, social security numbers, medical record numbers, and birthdates. All personal identifying information was removed [18].

Simulation analysis of SNV and CNA frequencies as predictors of invasive carcinoma in DCIS

We performed a simulation experiment to provide insight into the types of genomic alterations (in terms of both frequency and magnitude of association with IBC) that are most likely to be useful in a genomic predictor of IBC risk in DCIS. To construct a simulated genomic dataset, we based the sample size on the number of samples available in our study set (151 cases and 129 controls). We then created frequency-based classes of genomic alterations in DCIS and classes of differential frequencies between cases that progressed to IBC and controls that did not. We based our DCIS frequency classes on preliminary data for SNV/CNA frequencies in TCGA [14], as little is currently known about SNV/CNA frequencies in DCIS. We first created three frequency-based classes of genomic alterations in DCIS: low frequency (5 %), mid frequency (15 %), and high frequency (30 %), and four classes of differential frequencies between cases and controls: highly differential (alteration frequency is threefold higher in cases versus controls), moderately differential (alteration frequency is 1.5-fold higher in cases versus controls), low-level differential (alteration is 1.25-fold higher in cases versus controls), and nondifferential (alteration frequency is generated from the same distribution in cases and controls). Based on data for SNV/CNA frequencies in IBC in TCGA, we modeled 45 % (ten of 22) of our alterations as low frequency (this group is representative of low-frequency breast cancer alterations such as MLL3 mutation, PTEN mutation and GATA3 mutation), 27 % (six of 22) as moderate frequency (representative of moderate frequency breast cancer alterations such as 11q13 gain, 8q24 gain, ERBB2 gain and CDH1 mutation), and 27 % (six of 22) as high frequency (representative of common breast cancer alterations such as TP53 mutation, PIK3CA mutation, 1q gain, 8q gain, 16p gain, 20q gain, 16q deletion, 17p deletion, 8p deletion, and 22q deletion, among other common arm-level CNAs) in the simulated DCIS samples. We modeled nine of the 22 features (41 %) as deriving from distributions with differential frequency in the cases versus controls. These nine features were equally distributed across the nine possible permutations of frequency (low, moderate, high) and magnitude of case versus control differential (low, moderate, high).

For each of 2000 iterations, we first constructed simulated case and control data sets (as described above). We then used L1-regularized logistic regression to build a predictor and performed tenfold cross-validation to select the optimal value for the λ tuning parameter. For each of the 2000 iterations, we recorded the overall model performance (area under the curve (AUC) on held-out cases in cross-validation for the top-performing value of λ), the number of active features in the top-performing model, and the population-wide frequency (low frequency, moderate frequency, high frequency) and underlying distribution (nondifferential, low-level differential, moderately differential, highly differential) that gave rise to the active features.

Patient population and samples

Patient surgical samples diagnosed at SUH between 2000 and 2007 were selected for the presence of DCIS and constructed into a tissue microarray (TMA, TA-239) based on a previously described protocol [19, 20]. The size of the DCIS was not obtained. In brief, two experienced breast pathologists (KJ and RW) reevaluated the grading for this study and the criteria used included architectural pattern and the presence of necrosis. Samples were excluded due to paucity of material or poor preservation of material. The TMA contained one representative 0.6 mm core from 280 clinically independent tumors, 151 samples of DCIS only, and 129 samples of DCIS with concurrent IBC. Sampling of DCIS in close proximity to, or intermixed with extensive invasive cancer was avoided. A total of 271 patients with DCIS only (120 cases) or DCIS and IBC (151 cases) were included in the final analysis. Note that there were seven cases that contributed two gene profiles and one case that contributed three gene profiles. For the primary analysis, we used all 280 samples. As a sensitivity analysis, we randomly selected one sample from each case that contributed more than one gene profile, and used a total of 271 samples corresponding to the 271 unique patients.

Patient characteristics

In the 271 patients (280 samples) with DCIS, most were 40–64 years old and diagnosed from 2000 to 2003. Most (73.4 %) of patients were non-Hispanic (NH) white, with 19.6 % Asian/Pacific Islander, 3.7 % Hispanic, and 1.5 % NH black. Half (50.6 %) of the cases expressed hormone receptors (HR), and the most common grade was 2 (48 %). Among DCIS with IBC cases that had HR and human epidermal growth factor receptor 2 (HER2) status recorded, there was a roughly equivalent distribution between HR-positive HER2-negative (29.1 %), HER2-positive (35.8 %), and HR-negative, HER2-negative (triple-negative, 22.5 %) subtypes (Table 1 and see Additional file 1). HER2 gain was present in 30.8 % of the DCIS-only cases and 34.4 % of the DCIS with concurrent invasive cancer cases (Table 1). Treatments and outcomes varied somewhat by invasiveness: unilateral mastectomy was performed among 23.3 % DCIS-only and 31.1 % of DCIS with IBC patients, whereas bilateral mastectomy was performed among 20.8 % of DCIS-only and 27.8 % of DCIS with IBC patients. The rates of these surgical therapies are consistent with a study by Worni et al. where they found the rate of unilateral mastectomies in DCIS patients to be 23.4 % [21]. Only 8.3 % of DCIS-only patients were dead as of 2013, versus 19.9 % of DCIS with IBC patients (see Additional file 2).

Table 1 Characteristics of 271 patients with ductal carcinoma in situ (DCIS), with and without invasive breast cancer

Fluorescence in situ hybridization

Fluorescence in situ hybridization (FISH) was performed to examine chromosome 1q32, 8q24 and 11q13 gains. The genomic loci targeted were chosen based on the simulation results (see Results) and their frequency in invasive cancers from The Cancer Genome Atlas (TCGA) data [14]. We used 4 μm formalin-fixed, paraffin-embedded sections cut from the constructed TMA, based on a protocol previously described [22]. Briefly, BAC clones RP11-1044H13 (1q32), RP11-1136L8 (8q24.21) and RP11-94L15 (17q12) were obtained from the BACPAC Resources Center (Children’s Hospital Oakland Research Institute, Oakland, CA, USA), while clone CTD-2537F6 (11q13.3) was acquired from Invitrogen/Life Technologies (Grand Island, NY, USA). Probe RP11-1044H13 (1q32), RP11-1136L8 (8q24.21) and CTD-2537F6 (11q13.3) were labeled with Cy3 dUTP (cat number PA53022 GE Healthcare, Pittsburgh, PA, USA) and control probes RP11-1120M18 (3q25) and CTD-2344F21 (2q37) were labeled with AlexaFluor 647-aha-dUTP (cat number A32763 Life Technologies) and Green dUTP (cat number 02N32-050 Abbot Molecular, Des Plaines, IL, USA), respectively using the Nick Translation Kit (cat number 07J00-001 Abbot Molecular).

Scoring FISH

Imaging and analysis were performed using Ariol 3.4v software (Genetix/Leica Microsystems, San Jose, CA, USA). Fluorescence was scored visually using filters Cy3dUTP (green: 550 nm), AF 647 dUTP (red: 647 nm), and Green dUTP (yellow: 488 nm). Within the DCIS cells, total signals for each color within a given slide region were counted. Invasive carcinoma cells and nonneoplastic cells were excluded from the analysis. Signals from 100 cells per sample were counted, when possible, with a minimum of 40 cells counted in all cases. The test probes were individually hybridized with the two control probes for each genomic locus to determine copy number gain. Total test probe green counts (1q32, 8q24.21, 11q13.11 or 17q12) were compared with red (3q25) and yellow (2q37) control counts, which are frequently unaltered in breast cancer [14, 15]. The signals were scored according to two parameters; signals per cell and ratio of test probe to control probes. Only the DCIS components were scored and compared across cases, which were either DCIS alone or DCIS with concurrent IBC. Cases were scored as heterogeneous if at least 25 % of the scored DCIS cells had a different signal call. Cases were scored as gained if the target to control probe ratio was greater than 1.5 or the number of test signals was greater than three per cell. This scoring criterion was based on our previous study where we examined the HER2 copy number in a large cohort of breast cancers and a gain of greater than 1.5 was the cutoff value that most correlated with a worse outcome [23]. Cases were scored as deleted if the target to control probe ratio was less than 0.75, or greater than 25 % of the DCIS cells scored had a target to control probe ratio of less than 0.75. For consistency, we scored HER2 gain according to the criteria above set forth for the three genomic loci investigated.

Immunohistochemistry

HER2 immunoreactivity was evaluated by immunohistochemistry (IHC). The TMA were cut into 4-μm-thick sections, deparaffinized, hydrated, subjected to Cell Conditioning 1 (CC1, Ventana Medical Systems, Tuscon, AZ, USA) antigen retrieval and stained with a prediluted anti-HER2 antibody (Rabbit, Clone 4B5, Ventana Medical Systems number 790-2991) using an automated immunostainer. HER2 expression was scored according to the 2013 American Society of Clinical Oncology/College of American Pathologists HER2 Test recommendations [24].

Statistical analyses

We used logistic regression techniques to characterize the association between copy number gains at three loci and IBC among patients with DCIS. The multivariable model on which our primary analysis was based additionally included age at diagnosis, race, and hormone receptor status, grade of the DCIS component and HER2 gain in order to gauge the association of copy number status and IBC after adjusting for demographics and relevant clinical variables. A complete-case analysis was based on a model that included subjects who had data on all variables specified (N = 158). As a sensitivity analysis, we additionally employed multiple-imputation methods with ten imputed data sets (mi impute chained in Stata) to retain all subjects in the study even if they were missing one of the variables specified in the model (N = 271) (Stata Statistical Software. Release 13. StataCorp LP, College Station, TX, USA). A two-sided Wald test was conducted at the 0.05 level to assess the significance of the association. Odds ratios and 95 % confidence intervals were used to characterize the magnitude of the association. As a sensitivity analysis, we randomly selected one sample for cases with multiple gene profiles and repeated the logistic regression analyses with 154 cases in the complete case analysis and 271 cases in the analysis that employed multiple imputation.

Results

Simulation analysis of SNV and CNA frequencies as predictors of invasive carcinoma in DCIS

We conducted a simulation experiment to determine the types of genomic characteristics (based on frequency and association with IBC) most likely to be useful features in a predictive model of DCIS risk in IBC. The results from this analysis suggest that genomic features with moderate-to-high overall frequency (15–30 %) and high differential frequency between cases versus controls (threefold) are likely to be selected as active features in the predictive model, while lower frequency alterations and alterations with weaker associations with IBC are much less likely to be informative in a risk-prediction model (Fig. 1). For example, the simulated genomic features with moderate-to-high frequency and strong association with IBC were selected in ≥ 99 % of the iterations, while simulated genomic features with strong association with IBC but low population frequency were selected in only 65 % of the models (Fig. 1). The frequency of recurrent genomic alterations in IBC varies greatly between SNVs and CNAs, with less than eight SNVs occurring at a frequency greater than 5 % while more than 30 CNAs occur at a frequency greater than 15 % [15].

Fig. 1
figure 1

Genomic predictor simulation experiment. We created three frequency-based classes of genomic alterations in ductal carcinoma in situ (DCIS): low frequency (LF) (5 %), mid frequency (MF) (15 %), and high frequency (HF) (30 %), and four classes of differential frequencies between cases and controls: highly differential (HD) (alteration frequency is threefold higher in cases versus controls), moderately differential (MD) (alteration frequency is 1.5-fold higher in cases versus controls), low-level differential (LD) (alteration is 1.25-fold higher in cases versus controls), and nondifferential (ND) (alteration frequency is generated from the same distribution in cases and controls). For each of 2000 simulations, we used L1-regularized logistic regression to build a predictor and performed tenfold cross-validation to select the optimal value for the λ tuning parameter. For each simulation, we recorded which types of features in terms of frequency (LF, MF, HF) and differential status (ND, LD, MD, HD) were active in the model. The results are displayed in the figure, with the feature types along the X-axis and the proportion of simulations in which each feature type was active in the model along the Y-axis

Based on our modeling results, we expect that the majority of genomic features in a successful genomic classifier will be CNAs with fewer, if any, SNVs. As such, we decided to investigate the association between three of the most common and recurrent IBC-associated CNAs (gains of genomic regions of 1q, 8q24, and 11q13) and IBC risk in DCIS.

Univariate exploratory analyses of chromosomal gains in DCIS with or without invasive cancer

We examined the presence of copy number gains in three chromosomal loci, 1q, 8q24, and 11q13, by FISH in 280 samples diagnosed as DCIS only (122 cases with no development of IBC over a median follow-up period of 9 years), or DCIS plus IBC (158 cases) (Table 2) arrayed on a TMA. We chose to study a set of loci (1q, 8q24, and 11q13) which have a high frequency of copy number gains (>30 %) among at least two molecular breast cancer subtypes [14, 15]. The prevalence gains in all three genomic loci in the two groups of DCIS together were lower than values previously reported in IBC [15]. Overall copy number gain frequency was as follows: 1q at 52 % (compared to 64 % in IBC [15]), 8q24 at 44 % (compared to 60 % in IBC [15]), and 11q13 at 20 % (compared to 32 % in IBC [15]). Low copy number gains (one to two additional copies) represented the vast majority of copy number alterations at 1q and 8q24 (80 % and 78 %, respectively). In contrast, 11q13 had roughly equal numbers of low (53 %) and high (47 % with > 2 additional copies) gains (Fig. 2). When stratifying DCIS on whether there was concurrent IBC or not, we found increased genomic gains in DCIS with concurrent IBC (in comparison to DCIS alone) in all three regions when examined individually; in combinations; and with all three copy number gains. The prevalence of copy number gain was higher in DCIS with concurrent IBC versus DCIS alone across all three genomic loci individually (1.35- to 3-fold), in combinations, and with all three copy number gains (Table 2). We examined the co-existence of HER2 gain and the other three loci gains in both diagnostic groups. The overall copy number gain frequency of HER2 was 32.9 %. The prevalence of HER2 gain was higher in DCIS with concurrent IBC versus DCIS alone (Table 2).

Table 2 Chromosomal gains in ductal carcinoma in situ (DCIS) with and without invasive cancer
Fig. 2
figure 2

a Hematoxylin and eosin (H&E) image of ductal carcinoma in situ (DCIS) with 11q13 gain. b Fluorescence in situ hybridization (FISH) image of DCIS with high level of copy number gain

After finding the chromosomal gains of 1q, 8q24 and 11q13 to be increased in DCIS in the setting of IBC compared to DCIS only, we tested whether these gains are associated with IBC (Table 3 and see Additional file 3). We found statistically significant differences in distribution of copy number gains between the two diagnostic groups in all three regions when examined individually, in combination, and with all three copy number gains. The sensitivity for each of the three regions alone ranged from 37.9 to 58.4 %, with high specificity for the combinations of gains of 1q and 11q13 (88.2 %); 8q24 and 11q13 (91.8 %); and all three copy number gains (93.2 %). The combination of 8q24 and 11q13 gains demonstrated the highest positive predictive values at 79.4 %. When we examined the co-existence of copy number gains of HER2 and the other three genomic loci, we found a statistically significant difference in the frequency distribution between the two diagnostic groups for the cytogenetic combination of 1q, 11q13 and HER2 (p = 0.038, Table 3). The sensitivity for this combination performed at the low end when compared to the three copy number gain combinations at 25.8 %, with a specificity of 87.2 % and a negative predictive value of 48.6 % (Table 3).

Table 3 Performance of cytogenetic combinations as predictors of invasive breast cancer

Multivariable logistic regression and classifier analyses predicting invasive cancer among DCIS cases

To characterize the association between DCIS and IBC, we applied multivariable models of IBC as a function of a six-level categorical variable describing chromosomal gains at regions 1q, 8q24 and 11q13, along with age at diagnosis, race, hormone receptor status, histological grade and the presence of HER2 gain (Table 4 and see Additional file 4). The association between copy number gain and IBC was statistically significant in both complete-case analysis and multiple-imputation (MI) analysis (p = 0.0013, 0.0001, respectively) and shows that subjects with gains at all three loci are 18 times more likely to have an IBC diagnosis than subjects without gains at these loci; subjects with exactly two copy number gains are nine times more likely to have an IBC diagnosis, and subjects with 8q24 gain only are 4.2 times more likely to have IBC than subjects with no gain in these regions (MI analysis). Interestingly, the genomic copy number gain, age at diagnosis and HER2 gain were the only statistically significant variables in the model. Of note, HER2 gain is not significantly associated with invasive cancer in the univariate analysis, but is inversely associated in the multivariate analysis, in which subjects with HER2 copy number gain were significantly less likely to have an IBC diagnosis (odds ratio 0.47, p = 0.039), when compared to DCIS alone (Table 4). In addition, we examined HER2 “high” amplification (defined as > 10 copies per nucleus) and HER2 strong positivity (defined as 3+ IHC staining) and neither of these variables was significantly associated with invasive cancer on either univariate or multivariate analyses.

Table 4 Univariate and multivariable logistic regression analyses predicting invasive breast cancer among ductal carcinoma in situ (DCIS) cases

Discussion

This study demonstrates that genomic changes can act as a risk stratifier for DCIS, predicting the presence of concurrent IBC. We observed no significant differences between DCIS patients with and without concurrent IBC in standard clinicopathologic factors of race, hormone receptor status and histological grade. By contrast, we did find significantly higher frequencies for copy number gains at 1q, 8q24 and 11q13 with any two of three genomic loci and all three genomic loci in patients with DCIS and concurrent invasive cancer when compared to DCIS only. Multivariable analysis showed that gains at the three regions were significantly associated with IBC among patients with DCIS, after adjustment for important clinical variables including grade, hormone receptor status and even HER2 copy number gain, which was associated with a lower risk of having invasive cancer and is consistent with prior publications on DCIS [25]. Furthermore, we show that this is a feasible method, utilizing standardized FISH techniques, and as such has high potential to address the critical unmet need for accurate risk stratification and personalized treatment of DCIS.

Population-wide screening mammography has largely created the problem of diagnosing asymptomatic DCIS [13]; concerns about overtreatment have lent support for replacing “DCIS” with “ductal intraepithelial neoplasia”, emphasizing the indolent behavior of many of these lesions [26]. However, since we cannot predict which DCIS lesions will progress to invasive cancer, treatment guidelines recommend mastectomy or breast-conserving therapy plus RT, followed by adjuvant tamoxifen: this approach is excessive treatment for most patients [7, 2730]. Previous attempts at risk stratification, using protein expression markers such as p16, Ki67 and COX [16], or an RT-PCR assay that estimates the risk of local recurrence [31], are limited by problems of intratumoral variability and reliance upon IBC rather than DCIS for gene selection. Some genetic changes occur early in tumorigenesis and therefore are likely present in most of the neoplastic population at more advanced stages like DCIS [10]. DNA copy number changes are common in early genomic lesions and may be more robust as biomarkers than gene expression levels, which can be subject to heterogeneity due to intratumoral factors such as hypoxia. At the molecular level, CNAs and SNVs have been described previously in breast cancer [14, 15], and their application and integration into clinical practice is appealing. Our present modeling results show that CNAs are more likely to be prognostic than SNVs based on their frequency in IBC [15].

Our approach aimed to optimize practicality for ultimate translation to patient care. We used TMA technology because the amount of DCIS in each core is similar to the amount present in conventional breast biopsies. We used FISH to measure CNAs as this approach can generate single-cell measurements in a complex tumor microenvironment with multiple cell types present. Although molecular techniques are sensitive for detection and quantification at the SNV level [32], critical morphological correlation is lost. Our use of FISH on TMAs avoids this limitation, resulting in more precise genomic copy number data. The FISH technique is also currently used in the clinics to measure HER2 in breast cancer (and other more subtle genomic alterations in other neoplasia) and thus this approach may be easily adopted by most clinical laboratories.

Our cross-sectional study approach has limitations and advantages over a longitudinal approach. While a cross-sectional study does not allow for the evaluation of recurrence, we have a median follow-up of 9 years for the DCIS-only cases, a timeframe consistent with previous studies examining the recurrence rates of DCIS [33]. While the challenges of clinical biomarker assessment will ultimately be addressed with longitudinal cohorts of DCIS that progress over time to IBC, longitudinal cohorts do not address the genetic relationships between DCIS and IBC that are essential to our understanding as to how cancer develops. The problem with longitudinal cohorts is that the initial DCIS should be entirely removed at the time of the definitive surgical treatment. Therefore, the resulting subsequent recurrence, either DCIS or IBC, is likely not directly related to the primary DCIS. Given that the surgical treatment of the primary DCIS is to entirely remove the DCIS but not necessarily remove potentially related lesions of lesser risk (e.g., hyperplasias), a longitudinal DCIS cohort study would be more reflective of the risk potential of the associated lesser risk lesions that are not entirely removed. Alternatively, the recurrence may be directly related to the primary DCIS if the surgical resection is incomplete. However, genetic biomarkers generated from this scenario would not be related to intrinsic features of the primary DCIS but rather more complex treatment effects such as the clinical and radiologic appreciation of the extent of the disease. It is also possible that a clonally related neoplastic precursor, such as atypical ductal hyperplasia or columnar cell change, is present at the surgical margin and that residual part of this lesion progresses to the recurrent carcinoma. This would explain the observation that the recurrences typically occur in the same quadrant of the breast. This scenario is compatible with our lineage evolutionary tree analyses as determined by whole genome sequencing, where we can identify precursors in both columnar cell lesions and atypical ductal hyperplasia that are clonally related to both the concurrent ductal carcinoma in situ and the invasive carcinoma [10]. It is also possible that there is a nonneoplastic field effect, localized to that quadrant that is responsible for the recurrence. Additional studies on the lineages of the initial and recurrent lesions will be required to understand this fully. The main limitation of this cross-sectional approach is that it does not address the important clinical scenario of whether a patient with DCIS alone will eventually develop IBC. This is clearly an important question to address. However, as noted above, this question is less about the intrinsic features of DCIS than about the features of the neoplasia (e.g., hyperplasia) that remains unresected at the time of definitive surgery. Prior to tackling that question, it is useful to identify, on an evolutionary level, whether genomic changes within DCIS and its evolutionary ancestors predict the development of IBC. From this perspective of identifying features in DCIS that predict risk, a cross-sectional study is appropriate as the natural evolutionary relationships between DCIS and IBC are retained.

Although the FISH assay we developed did identify high-risk DCIS cases, there are multiple subtypes of IBC and likely multiple corresponding subtypes of DCIS, and as such different combinations of markers may be needed for risk stratification of different DCIS subtypes. Our understanding of the different pathways involved in the development of IBC and specific genomic alterations therein is growing. Low- and high-grade neoplasias demonstrate different CNAs [34]. In addition, PIK3CA mutation occurs early in oncogenesis and is associated with ductal hyperplasias, while TP53 mutations at early stages have not been found [35, 12]. Furthermore, NOTCH/MAST fusions have been described in cases of DCIS associated with IBC [36]. This growing knowledge will serve to guide future studies of the approach we present here.

Conclusions

In conclusion, our proof-of-principle study demonstrates the feasibility of a novel genomic predictor of breast cancer risk using data derived from TCGA, and characterizes its performance in the context of patient demographic and clinical factors. The three FISH assays for 1q, 8q24 and 11q13 positively identified a subset of high-risk DCIS patients; if expanded and validated in prospective trials, this approach, which can be integrated into routine clinical practice readily, may ultimately improve the care of patients with early breast neoplasia.