Introduction

Extensive copy number aberrations (CNA) are a hallmark of cancers with genome instability and are observed among a wide variety of epithelial malignancies originating from the colon, breast, cervix, prostate, bladder and stomach1. High levels of CNAs are associated with cancer progression and poor prognosis. Thus, there is general interest in profiling CNAs as potential biomarkers associated with specific clinical outcome.

The focus of our study was colorectal cancer (CRC), the third most common cancer world-wide, with ~1.8 million estimated new cases yearly. The majority of CRCs demonstrates an extensive CNAs and as a result, are designated as belonging to the chromosomal-instability (CIN) molecular subtype. To accurately profile CNAs and evaluate their prognostic significance in CRC, a variety of methods have been used which include karyotyping, fluorescent in-situ hybridization (FISH), and chromosomal microarrays, such as comparative genomic hybridization (CGH) arrays and single nucleotide polymorphisms (SNP) arrays.

Citing some examples, with karyotyping, Bardi et al. found that loss of chromosome 18 was correlated with shorter overall survival in early-stage patients (N = 150)2. Personeni et al.3 using FISH identified that changes in EGFR copy number predicted overall survival for EGFR-targeted therapy in a metastatic CRC cohort (N = 87). Using SNP-arrays, Sheffer et al. identified deletions of 8p, 4p, and 15q were associated with poor survival in a mixed-stage cohort (N = 130)4. Other microarray-based studies5,6,7 using smaller numbers of patients (N <100) identified various CNAs associated with poor survival that included sub-arm losses of 1p, 4p, 5, 6, 8p, 10, 14q, and 18. In contrast, based on CGH-array analysis, Rooney et al.8 reported that no specific CNA was significantly associated with survival in Duke’s C-stage CRCs (N = 29). The lack of concordance among these studies reflects the clinical stage variation among the study cohorts, the inherent limits of the molecular methods used for detecting CNAs and suggests clinical stage-specific variation among the study cohorts.

More recently, researchers have employed whole genome (WGS), whole exome (WES), and targeted sequencing for high-resolution analysis to profile CNAs. WGS has significant advantages over the other approaches because it provides whole-genome coverage without targeting and capturing as compared to other methods including microarrays and exomes. Exome and targeted sequencing have technical biases due to the extra DNA amplification and hybridization steps. Furthermore, these methods with their emphasis on gene targets cover only a small proportion (<3%) of the genome and thus miss significant portions of the noncoding genome which are noncoding. Conducting high-coverage WGS is costly even when considering recent cost reductions in sequencing. It also generates large data sets that incur significant informatics cost.

To overcome some of the challenges of conducting cancer WGS studies on populations for CNA profiling, we developed a low-coverage whole genome approach that provided highly accurate genome-wide copy number results. As a result, this leads to a significantly lower per sample cost than conventional WGS. For this study, we used an average sequencing coverage of 2–4x, which is chosen in the range established by The Cancer Genome Atlas (TCGA)9 and the 1000 Genomes Project studies10. This low-coverage WGS, however, does not identify somatic mutations or determine copy number neutral loss-of-heterogeneity events, which should be ascertained by parallel assays, such as WES or targeted sequencing panels. Our study was solely focused on identifying CNAs that were associated with prognosis using this low-coverage WGS approach.

We optimized the WGS approach for sequencing formal fixed paraffin embedded (FFPE) samples, thus enabling our approach to be widely used for archival pathology biopsies. We verified the accuracy of CNA segments generated with this approach using a systematic benchmark with microarray, WES and high-coverage WGS. Overall, development of this low-coverage WGS enabled us to analyze the entire genome for CNAs while reducing the sequencing cost and bioinformatic workload.

We applied this WGS approach to a Stage III CRC cohort. Among the different clinical stages of CRC, there is a particular interest in identifying CNAs that indicate poor prognosis for individuals with Stage III disease, where local lymph node involvement is present without imaging or pathologic evidence of distal metastasis. These patients routinely receive adjuvant chemotherapy and yet, there is a significant fraction that show recurrence despite receiving adjuvant treatment after complete resection of their cancers. Identifying Stage III patients at high risk for recurrence may prove useful in targeting these individuals for more effective adjuvant regimens and developing more sensitive screening protocols for detecting early metastasis.

We conducted a WGS analysis on a discovery cohort of Stage III CRCs (N = 134). We determined whether any specific CNAs were associated with a poor survival within the cohort. We validated our results with an independent cohort of Stage III CRCs from the Cancer Genome Atlas project (N = 187). Our findings provided additional evidence to support that specific CNAs are predictive for CRC progression, having identified a specific genomic deletion that is predictive for lower overall survival.

Results

Copy number calling

We benchmarked genome segmentation by bioinformatics tools such as CNVkit and Bic-seq on low-coverage WGS data. To do so, we randomly selected a WGS data set of 10 tumor-normal matched CRCs from the TCGA (Fig. 1). Their copy number analysis data were also publicly available from SNP microarrays. Using either caller, we observed that the WGS-derived genome segments were highly correlated with the microarray segments, which were considered as ground-truth (Fig. 2). The genome-wide tile-based average PCC were, 0.966 and 0.963 for CNVkit and Bic-Seq, respectively, and the gene-based ones were 0.943 and 0.938. We observed no statistically significant difference between PCC metrics of CNVkit and Bic-seq (Wilcoxon’s P = 0.68). We selected CNVkit because of its consistent performance with a relatively smaller median absolute deviation (i.e. intra-tile deviation metrics): 0.0088 vs 0.0113.

Figure 1
figure 1

The study design and whole genome sequencing analytical workflow. (A) The whole-genome sequencing (WGS) analysis share the same sample preparation, DNA extraction and quality control steps as WES (color shaded light green). The prepared genomic DNA libraries are pooled for WGS directly, while they require additional PCR amplification and hybridization steps to generate exomic libraries for pooled WES. (B) We integrated CNVkit, Gistic2 and various R packages to perform copy number segmentation, CNA calling and biomarker discovery analyses.

Figure 2
figure 2

Benchmarks to evaluate low-coverage WGS approach and bioinformatics. (A) Pearson’s correlation coefficients (PCC) between low-coverage WGS and microarray segments as stratified by segmentation tools; (B) PCC between low-coverage and high-coverage WGS as stratified by tumor purity; (C) The means of robust standard deviation (MAD) as stratified by low-coverage WGS and WES analysis platforms.

We examined the potential variability and bias as related to low average coverage. For this evaluation, we sequenced eight pilot tumor-normal pairs at a higher coverage (~30x) and randomly down-sampled these same samples to ~3x. We found a high correlation between the low- and high-coverage WGS segments with an average PCC at 0.93 (Fig. 2). We identified that only three samples that had PCC < 0.9 all had tumor purity <40% (Wilcoxon’s P = 0.036), suggesting that the sensitivity of detection was variable in samples with a lower overall tumor fraction. Our results point to low-coverage WGS derived segments are highly concordant with high-coverage WGS derived segments, which is also apparent from visual inspection of the segmentation tracks compared for the same sample (Supplementary Fig. S1).

We compared genome segmentation results between low-coverage WGS and WES platforms. There was general agreement between methods although the WES CNV estimates had a higher level of noise. Within a genome tile size of 100 kb, we did not generally expect abrupt copy number changes within a tile; therefore, the intra-tile CNR deviation were a result of experimental variability as related to the sequencing preparation. We found that the mean intra-tile copy number ratio deviation (as measured by median absolute deviation) was much higher in WES derived segmentations compared to the low-coverage WGS (Wilcoxon’s P = 0.00018). Specifically, the intra-tile deviation metric was as small as <0.1 for WGS while it was 1.09 for WES. Our results identified that WES-derived segments was highly variable, which is apparent from visual inspection of the segmentation tracks compared to WGS from the same sample (Supplementary Fig. S2).

Copy number features associated with clinical parameters

Using a discovery cohort of 134 tumor-normal pairs, we conducted WGS analysis and identified CNAs. The cohort consisted of Stage III CRCs originating from patients that were diagnosed between 2001 and 2015 (Table 1). We examined demographic (gender, ethnicity, age) and relevant clinical variables (treatment, sideOfColon, cancerGrade, recurrence, BMI, smokingStatus) associated with overall survival. The right side of colon was defined as any of “ascending colon”, “appendix”, “cecum”, “hepatic flexure”, “transverse” and the left side of colon was defined as any of “descending colon”, “rectosigmoid junction”, “sigmoid colon”, “splenic flexure”. Treatment was defined as a binary variable, which is a ‘yes’ if a patient received adjuvant chemotherapy – that is any treatment occurred with either 5-fluoruracil-based” (5-FU) or capecitabine regimen.

Table 1 Summary statistics and multivariate cox regression results for the Stage III colorectal cancer discovery cohort.

Among these Stage III CRC patients, 74 (55%) received adjuvant chemotherapy – treatment occurred with either 5-fluoruracil-based (5-FU) or capecitabine regimen. Patients undergoing adjuvant chemotherapy had a statistically significant improved survival as expected11,12, with a relative reduction of death risk >58% (HR = 0.416, 95% CI = [0.239, 0.724], P = 0.00195). These patients had a median overall survival of more than a year longer than that of patients who did not receive any therapy.

Stage III CRC patients with a primary tumor located in the left colon had a statistically significant improved survival with a relative reduction of death risk >42% (HR = 0.576, 95% CI = [0.342, 0.971], P = 0.0382). These patients had a median overall survival that was approximately one-half year (181 days) longer than that of patients had tumor occurred to the right of the colon.

Analysis of recurrent arm-level CNAs

We observed extensive focal- and arm-level CNAs among the Stage III CRC discovery cohort (Fig. 3). As has been widely reported and is seen with samples exhibiting CNAs, >85% of CRC are CIN phenotype and this percentage is even higher for advanced CRCs for CIN’s known to be associated with poor prognosis. The Gistic2 analysis revealed that more than half of the samples showed CNA gains of chr7, chr8q, chr13q and chr20q. Approximately half of the samples had CNA loss in chr17p and chr18.

Figure 3
figure 3

Copy number profiles of the discovery cohort. Copy number ratios (CNR) are shown for upper split panel: patients had chr17p loss; and lower split panel patients had no chr17p loss – all based on Gistic2 calls. Row color coding: black for shorter survival patients (the lower 50%) and grey for longer survival patients (the upper 50%).

The frequency of recurrent CNA gains or losses identified in the discovery cohort were concordant with the frequency observed in the TCGA stage III validation cohort. With a q-value of <0.25 and a 99% confidence as our statistical thresholds (Gistic2), we identified 21 arm-level CNAs that were statistically significant (Fig. 4). Seventeen CNAs were also significantly recurrent in the Stage III TCGA cohort. These included amplifications of chr1q, chr7, chr8q, chr13q and chr20, and deletions of chr1p, chr4, chr5q, chr15q, chr17p, chr18, chr21q and chr22q. Overall, the concordance of statistically significant recurrent arm-level CNAs between the two cohorts was >81%.

Figure 4
figure 4

Arm-level chr17p loss predicts for poorer survival in Stage III CRC. (A) Venn diagram for shared arm-level CNAs between the discovery and TCGA validation cohorts. (B) The Kaplan–Meier plots of the Stage III CRC discovery cohort as stratified by patients’ status of carrying the chr17p arm loss (SCNA_CHR_ARM_17p_del = 1 for yes, otherwise 0). Also shown are box plots for comparing (C) number of focal CNAs (D_N_Focals) and (D) number of arm-level CNAs (D_N_Arms) between patients carrying or not carrying the chr17p loss.

Chromosome arm 17p loss is associated with poor overall survival

We applied lasso-regularized multivariate cox regression to select for significant arm-level events predictive of patients’ overall survival (Methods). We identified the loss of chr17p as the single and most significant arm-level event associated with overall survival. This significance was apparent using a non-zero regression coefficient. By our final multivariate model (Eq. 2), the loss of chr17p was a significant predictor of lower overall survival (P = 0.0160, HR = 1.706, 95% CI = [1.104–2.635]). Any patient carrying 17p deletion was associated with a 68.8% increased risk of death. In addition, an increased risk was evident examining the resulting Kaplan-Meijer curves when patients were stratified by their 17p loss status (Fig. 4).

Other factors including gender, ethnicity and age at diagnosis were not associated with overall survival. After adjusting for 17p-loss, the patients having left sided CRC still had a relative reduction of death risk >33%. However, the effect ceased to be statistically significant with a marginal value (P = 0.0727). This result was likely due to a sensitivity limit imposed by the sample size. Nonetheless, patients receiving adjuvant therapy, i.e. treatment, (P = 0.002, HR = 0.478, 95% CI = [0.299–0.765]) were benefitting from a significant reduction of risk of death (52.2%) after adjusting for 17p-loss. This statistically significant difference in survival is also visible as an enrichment of shorter survival patients within the strata of carrying 17p-loss (Fig. 3).

For our validation analysis, we employed the same multivariate Cox model to the TCGA cohort data. Using this completely independent data set, we confirmed that the loss of 17p was associated with poor survival (P = 0.0126, HR = 2.357, 95% CI = [1.202–4.621]). A patient with Stage III CRC and a 17p deletion had a 135% increased risk of death in the TCGA cohort. No other genomic, demographic and clinical factors were significant, including gender, ethnicity and age at diagnosis, except for treatment (P = 0.00141, HR = 0.322, 95% CI = [0.161–0.646]). Receiving adjuvant chemotherapy treatment was associated with a 67.8% improvement of survival, a result that was concordant with our findings from the discovery cohort and previous clinical trials.

Using our discovery cohort, we examined whether patients treated with adjuvant treatment had a lower overall survival based on 17p copy number. A 17p-loss was associated with an increased death risk at HR = 1.457, CI = [0.855,2.482], P = 0.166. Thus, a trend towards lower survival in the setting of adjuvant therapy was noted but the sample size was too small to reach statistical significance.

Chromosome arm 17p loss is associated with increased chromosomal instability

TP53, a critical tumor suppressor involved in genome stability, is located on the 17p arm. Somatic alterations on TP53 were found to increase genome instability across cancer types in the literature, see Donehower et al.13 for a recent example. To determine if increased chromosomal instability is a potential mechanism explaining 17p-loss’s association with shorter survival, we performed additional analyses.

First, we excluded the microsatellite instability (MSI) subtype patients from our discovery cohort and repeated the survival analysis. MSI tumors are genome stable, thus lacking CIN and have better survival. We excluded 11 (~9%) MSI patients from our discovery cohort, who were either clinically tested as MSI-high by polymerase chain reaction (PCR) -based amplification of microsatellite repeats or tested positive for mismatch repair (MMR) Immunohistochemistry, which includes MLH1, MSH2, MSH6, and PMS2 proteins.

We repeated our analysis with the multivariate model (Eq. 2) on the remaining 123 CIN patients and the loss of chr17p was found to be a significant predictor of lower overall survival with a slightly higher risk (P = 0.0124, HR = 1.785, 95% CI = [1.134–2.811]). Any patient carrying 17p deletion was associated with a 78.5% increased risk of death. The increased risk was also clear by examining the resulting Kaplan-Meijer curves when the remaining CIN patients were stratified by their 17p loss status (Supplementary Fig. S3).

We also compared the focal- and arm-level CINs (as represented by the total counts of CNAs) between patients carrying or not carrying the 17p loss (Fig. 4). We consistently found consistently higher CIN scores in patients carrying the 17p loss (one sided T-test, P = 2.6e-9 for focal and P = 2e-10 for arm-level CIN, respectively). This increase is notable in Fig. 3, where more than >63% patients who had chr17p loss also had significant copy number changes globally indicative of a significant CIN phenotype (in the upper 50% percentile of CIN).

Discussion

There is a broad interest in determining which CNAs indicate poor prognosis for individuals with Stage III CRC. This study represents one of the most comprehensive copy number analysis of Stage III CRC with the benefit of using WGS and its associated complete genome coverage and higher resolution. With these WGS results, we identified stage-specific CNA prognostic markers for Stage III CRC. Our analysis identified the loss of chromosome 17p arm, spanning TP53, as a potential biomarker for poor survival in Stage III CRC. Our results were independently validated by the TCGA cohort. Patients with 17p/TP53 deletion in their CRC tumors have 1.6 times relative death risk in general (1.8 times for non-MSI), as compared to those tumors which do not. Previous studies of CRC have used cohorts with mixed clinical stages. For example, nearly all of the studies included a higher number of CRCs from Stage I and II patients compared to Stage III and IV patients. Furthermore, nearly all the previous studies did not conduct an independent validation using a separate population of CRC patients with an independent validation study. The majority of these studies used low resolution molecular methods with reduced sensitivity for CNAs.

Interestingly, 17p loss has been previously reported to be prognostic marker for poor survival in other cancers, including brain tumors14,15,16, bone tumor17, periampullary cancer18, pancreatic cancer19, leukemia20, and in CRC21 evaluated with FISH. These results suggest that 17p loss may be generally useful for predicting patient outcomes. Other large-scale copy number events have been previously identified as prognostic markers for colorectal cancers, e.g. 18q deletion for stages II and III colon cancers22,23, which also lend support to the argument that CNA profiling may be useful for CRC management. While we did observe a slightly increased hazard (HR = 1.28) for patients carrying the 18q-loss in our cohort, the association did not reach statistical significance. This limitation came from the study being underpowered per the sample size.

We identified that chromosome arm 17p CNAs occurred consistently in both the discovery and validation cohorts. Minor differences were noted between the two cohorts, as the discovery cohort had chr16p and chr19 amplifications and chr21p deletion while the validation cohort had chr2q amplification and chr14q deletion. These minor differences are likely attributable to the limited cohort size such that not enough samples were available in either cohort for determining the statistical significance of recurrent events presented at lower frequency. It may also reflect the population specific genetics for CRC progression – additional studies will be required to clarify these differences.

We also detected a trend for 17p/TP53 loss as a predictive biomarker for poorer adjuvant chemo-therapy response in Stage III CRC. Although the finding did not achieve statistical significance due to the small cohort size, there are several pieces of additional evidences in the literature for consideration. For example, a recent publication, Oh et al.24 has found a low-expression of TP53 protein was associated with poor cancer-specific survival in Stage III and high-rick Stage II CRC patients (N = 621) who were treated with oxaliplatin-based adjuvant chemotherapy. Additionally, using the TCGA cohort, we found TP53 copy number loss were significantly associated with lower mRNA expression level (P < 1e-15, one tailed T-test on Z-normalized mRNA expression levels, see Supplementary Fig. S4). All together, these findings suggested that the loss-of-function of TP53 protein, as genetically determined by the focal TP53 gene loss or arm-level chr17p loss, has important prognostic value for late stage CRC patients receiving adjuvant therapies.

To explain the effect of 17p loss, a likely mechanism is increased chromosomal instability, which was observed co-occurring with 17p loss. We analyzed the association between focal and arm-level chromosomal instability and 17p loss and found they were significantly associated with each other. In addition, other studies have shown that 17p loss-of-heterogeneity was correlated with CRC’s metastatic potential25. Similar findings were reported for other cancer types like brain tumor26 and esophagus cancer27. It has been reported that allelic loss of 17p allelic loss was highly correlated with TP53 mutations28. All of these findings suggest that the loss of 17p is directly related to higher CIN.

The loss of 17p thus could also be an indicator for the loss of TP53 function which is known to contribute to CIN in CRC. Vogelstein et al. theorized that the TP53 genetic alteration occurs at relatively later stage of colon cancer and is responsible for promoting tumor invasion to surrounding normal tissue29. Previously, point mutations involving TP53 were associated with poor survival in colorectal cancers30, as indicator of TP53 loss-of-function31. Our observation of extensive presence of 17p loss in Stage III CRCs provides additional supporting evidence to this conclusion.

Identifying copy number variation with high coverage WGS data have been studied extensively in basic and clinical research settings32,33,34. For example, in a recent systematic benchmark, Trost et al. reported good performance of using >20x WGS data for identifying small-scale CNVs (1–100 kb)35. The potential of using lower coverage WGS for characterizing germline CNVs and somatic CNAs has also been explored by others with cell line, blood, and fresh or archived tissue samples before. Citing a few examples, Zhou et al. showed WGS at 1x to 5x coverage outperforms array-base analysis for detecting large-scale CNVs in the NA12878 cell line36. In a larger study, Dong et al. showed ~50% diagnostic yields of detecting pathogenic germline CNVs when applying ~0.25x coverage WGS to more than five hundred miscarriage tissue or blood samples37. Using ~1x coverage WGS and samples from six multiple myeloma patients, Elnenaei et al. found >90% sensitivity and specificity when findings were compared to the FISH results. Kader et al. found that their modified low-coverage WGS at 1.6 to 1.8x achieve highly concordant CNA detection with array-based analysis in two Merkel cell carcinomas derived FFPE samples.

However, to date, there are very few if any studies which have demonstrated the scalability and robust performance of lower coverage WGS for profiling focal and arm-level CNAs (>100 kb) for hundreds of achieved, heterogenous cancer tissue in one study. In this work, we leveraged the generally higher resolution of WGS analysis while using a cost-effective low-coverage approach. To compensate for DNA degradation and potential low purity associated with FFPE clinical tissue samples, we aimed for 2–4x, which is at the high-end of coverage compared to other studies. To correctly identify somatic CNAs, we adopted a panel of normals approach to filter out any population specific copy number variable regions and germline events. To maximize cost-efficiency for assaying hundreds of samples, we adopted a pooling strategy with robot-assisted sample preparation.

We also considered the overall cost of this assay for potential clinical implementation as a routine test (Supplementary Fig. S5). Notably, the per-sample cost for WGS analysis, as calculated in US dollars, was related to the extent of sample multiplexing. When combining 10 samples together, the cost ranged from $480 - $960, while it decreases to $77–$144 per sample, assuming a batched size of 500 tumor biopsies. The lowest cost per-sample involved using the NovaSeq 6000 + S4 Flowcell platform with 2.4 Terabyte output – this enables sequencing 400 samples. Compared to per-sample microarray cost, approximately $200 per Affymetrix SNP-array 6.0, the per-sample WGS cost was determined to be lower when multiplexing samples on the HiSeq X and NovaSeq platforms.

Our results showed that, performance-wise, the low-coverage WGS consistently produced high concordance segmentation with microarray and high-coverage WGS. It has less noise and bias as compared to WES-based results. The low-coverage WGS provides thousands of read pairs per 100 kb segment, a substantive amount enough to enable sensitive CNA detection. This provides an improved resolution compared to SNP microarrays in which there are approximately 60 probes per 100 kb segment. At a given scale, routine clinical sequencing of FFPE biopsies can be done cost effectively. Thus, our cost-analysis showed that using low-coverage WGS is now competitive to microarray analysis for both performance and cost.

As WGS studies become less expensive, we foresee that in the future low-coverage WGS may prove to be replacing clinical microarray testing for cancers38, developmental disabilities, congenital anomalies39,40,41, autism spectrum disorder42, and many other genetic diseases32. Citing the benefits of WGS, a recent study compared the performance of low-coverage WGS versus microarrays on rare and undiagnosed cases. The conclusion of this study was that robust identification of CNVs was highly feasible with low-coverage WGS43. In another study, low-coverage WGS also found successful application in preimplantation genetic diagnosis of monogenic disease44.

Methods

Discovery cohort ascertainment

The Institutional Review Boards (IRB) from Stanford University and Intermountain Healthcare approved the study. A total of 134 patients were recruited through the Intermountain Cancer Center (St. George, Utah, USA). Selection criteria involved those diagnosed with Stage III CRC in 2001–2015. We excluded patients who survived less than 90 days after the initial diagnosis and died from non-cancer causes. We collected relevant clinical information from patient medical records, including age of diagnosis, gender, ethnicity, body mass index (BMI), and smoking status (Table 1 and Supplementary Table 1).

DNA extraction from clinical samples

All clinical samples were acquired with informed consent under an approved institutional review board protocol from the Intermountain Healthcare. We collected matched primary colorectal adenocarcinoma tumor and normal colon tissue samples from each patient (Fig. 1). All samples were determined to have greater than 60% tumor content in pathology review. We used a two-millimeter punch from a tumor or normal FFPE tissue block. The DNA was isolated from tissue using the Maxwell-16 and Promega-AS1030 DNA purification kit (Promega, Wisconsin, USA). The genomic DNA was quantified via the Qubit (Thermo-Fisher Scientific, Massachusetts, USA) and quality assessment was performed with the LabChip GX (PerkinElmer, Massachusetts, USA).

Sequencing

For sequencing library preparation, 500 nanograms of DNA from each sample was sheared using a Covaris E220 (Covaris, Massachusetts, USA) with microtube plates and following parameters: intensity level of five, duty cycle of 10%, cycles per burst of 200, and treatment time of 55 seconds. The DNA was then purified with a 0.8X AMPure XP (Beckman-Coulter, California, USA) bead cleanup to maintain a large insert size for sequencing. We used this total yield of purified DNA for the Kapa Hyper Prep Kit for Illumina (Roche, Basel, Switzerland). The standard KAPA protocol was followed with eight cycles of PCR amplification and a 0.8X post-amplification cleanup. We used 10 base pair dual-index sequencing adapters to allow for index swapping detection.

We measured the library quality with the LabChip GX and quantity with the Qubit (Supplementary Table 2). The libraries were pooled and sequenced on an Illumina MiSeq (Illumina, California, USA) for paired-end 300 basepair reads. The sequencing libraries were re-pooled and normalized based on the MiSeq data before paired-end 300 basepair sequencing on an Illumina NovaSeq 6000 system achieving 2–4x coverage per sample. Sequence reads were aligned to the human reference genome GRCh37/hg19 with the Burrows-Wheeler Aligner45.

Copy number segmentation

For determining which copy number segmentation tool provided accurate results on WGS from FFPE-extracted DNA, we evaluated the copy number callers, CNVkit46 and BicSeq47 – both are readily available as open source scripts. Segmentation involves defining the intervals that are affected by a copy number change. As test data set, we downloaded the low-coverage (~5x) WGS and the SNP-array data of 10 randomly selected (Supplementary Table 3) CRC tumor-normal pairs from TCGA. Using the WGS data, we inferred the genome segments with CNVkit and BicSeq, and estimated log copy number ratio per segment (CNR). For each sample, we correlated the WGS-estimated segmental CNRs to the microarray CNRs using 100-kb genome-wide tiles. We computed the Pearson’s correlation coefficient (PCC) between the two set of estimates and summarized PCC over all samples by mean and standard deviation. We compared the metric difference between groups using the two-sided T-test. We also conducted a gene-based PCC analysis using the same data.

Next, we evaluated how WGS coverage reduction affects genome segmentation. We applied ~30X high-coverage WGS analysis to eight patients with the identical protocol to low-coverage WGS. The high-coverage sequence data were down-sampled to low-coverage (~3x) data. We performed genome segmentation using CNVkit on both the high- and low-coverage WGS data. We computed and compared the PCC metrics for high- and low-coverage CNRs based on tumor purity.

WES and targeted sequencing are other common choices for CNA analysis. We also benchmarked low-coverage WGS to high-coverage WES (~300x) in a random subset of 10 patients. We performed genome segmentation using CNVkit on both the high-coverage WES and low-coverage WGS data. We computed and compared the intra-tile deviation metrics of estimated segmental log CNRs based on 100 kb genome tiles.

Integrated copy number analysis pipeline

We integrated CNVkit46, Gistic248, coxph, survival and glmnet49,50 packages of R into our final copy number analysis bioinformatics pipeline (Fig. 1). We used the data from the genome segments inferred by CNVkit to Gistic2, a cohort CNA caller. We ran Gistic2 with the following arguments: “-refgene hg38.UCSC.add_miR.160920.refgene.mat -maxspace 10000 -ta 0.1 -td 0.1 -qvt 0.25 -broad 1 -brlen 0.7 -twoside 1 -conf 0.99 -genegistic 1 -armpeel 1 -savegene 1 -res 0.05 -smallmem 1 -js 4”. We set the noise cut-off for both deletion and amplification to 0.1. Coupling CNVkit with Gistic248 enabled us to identify recurrent arm and focal-level CNAs with statistical significance. We also integrated and ggplot2, survminer, ggpubr, inferCNV R packages51 for data visualization.

To control for false positives, we identified error-prone CNA regions that demonstrated a high level of CNV background noise using normal DNA samples which had no somatic copy number changes. We ran genome segmentation and CNA calling on all normal DNA samples 10 times and each time with one random normal sample as reference. We compiled all of the CNA calls and identified regions that demonstrated copy number changes in >10% of samples in each run for >5 runs. These changes were likely the result of false copy number calls that were specific to FFPE-extracted DNA, amplification bias or sequencing artifacts. We filtered out the false positive segments from those noisy regions before conducting the Gistic2 analysis.

Stage III-specific biomarker discovery

We first fitted a multivariate cox model with relevant clinical and demographical covariates to identify any such variables was associated with survival. The full model is as follows:

$$\begin{array}{c}(time,\,status) \sim gender+ethnicity+age+treatment+colonSide+cancerGrade+\\ \,recurrence+BMI+smokingStatus\end{array}$$
(1)

Next, we created binarized CNA variables for each arm-level deletion or amplification (e.g. chr1pdel or chr1pamp for chr1) using the Gistic2 output. The variable is coded one if the CNA amplitude exceeds the noise cut-off, otherwise zero. We fitted a multivariate cox regression model with lasso-regularization to select for candidate CNA biomarkers52, including all the 88 arm-level CNA variables, gender, ethnicity, age and treatment.

Finally, we tested the lasso-selected candidate CNA variables’ significance by the following multivariate cox regression model:

$$(time,\,status) \sim candidateCNA+gender+ethnicity+age+treatment+colonSide$$
(2)

The resulting p-values were adjusted by the Bonferroni correction and significant results were declared with Bonferroni-adjusted P < 0.05.

Biomarker validation

CNA biomarkers were validated with an independent cohort of 187 Stage III CRCs from TCGA (Supplementary Table 4). We downloaded the overall survival time, survival status, SNP-array based segments, and other clinical data for these patients in the validation cohort. The genome segmentation data was formatted for Gistic2 analysis. With this independent validation dataset, we tested the same multivariate model (Eq. 2), including these candidate CNA markers in question. A candidate marker was declared statistically significant only if the Bonferroni-adjusted P < 0.05.

Chromosomal-instability analysis

We measured chromosomal instability (CIN) at focal and arm levels by counting the total number of such events sample-wise. We denoted the arm-level CIN by the variable D_N_Arms, which is calculated as the total number of arm-level CNAs per sample. Similarly, we denoted focal-level CIN by the variable D_N_Focals, which is calculated as the total number of focal-level CNAs per sample. We applied a T-test to determine if any significant difference in CIN in patient groups as stratified by chr17pdel status using the D_N_Focals and D_N_Arms measures.

Ethics approval and consent to participate

The Institutional Review Boards from Stanford University and Intermountain Healthcare approved the study. The study was performed in accordance with the Declaration of Helsinki. All samples were acquired with informed consent under an approved institutional review board protocol from the Intermountain Healthcare.