The detection and implication of genome instability in cancer

Genomic instability is a hallmark of cancer that leads to an increase in genetic alterations, thus enabling the acquisition of additional capabilities required for tumorigenesis and progression. Substantial heterogeneity in the amount and type of instability (nucleotide, microsatellite, or chromosomal) exists both within and between cancer types, with epithelial tumors typically displaying a greater degree of instability than hematological cancers. While high-throughput sequencing studies offer a comprehensive record of the genetic alterations within a tumor, detecting the rate of instability or cell-to-cell viability using this and most other available methods remains a challenge. Here, we discuss the different levels of genomic instability occurring in human cancers and touch on the current methods and limitations of detecting instability. We have applied one such approach to the surveying of public tumor data to provide a cursory view of genome instability across numerous tumor types.


Introduction
Cancer is a disease characterized and fuelled by dynamic genomic changes. The vast number of structural abnormalities present in cancer genomes is largely attributed to genomic instability, a transient or persistent state that increases the spontaneous mutation rate, leading to gross genetic alterations such as rearrangements and changes in chromosome number (aneuploidy). Genomic instability is therefore a driving force of tumorigenesis in that continuous modification of tumor cell genomes promotes the acquisition of further DNA alterations, clonal evolution, and tumor heterogeneity [1]. It is a feature of almost all cancers and has been observed in a range of malignant stages, from pre-neoplastic lesions prior to acquired TP53 mutations to advanced cases [2][3][4]. Numerous theories regarding the source of genome instability have been proposed. These theories, which include the mutator phenotype, DNA damage-induced replication stress, telomere dysfunction, and mitotic checkpoint failure [5][6][7][8][9][10][11], vary principally in their supposition of how early in tumorigenesis instability occurs, mechanisms leading to sequence level alteration, and whether instability initiates tumorigenesis or is merely a consequence of malignant transformation. While these mechanisms may all contribute to instability phenotypes to some extent in cancer in general, their prevalence varies across tumors derived from distinct cell types or in response to different carcinogens or selective pressures.
Genomic instability refers to a variety of DNA alterations, encompassing single nucleotide to whole chromosome changes, and is typically subdivided into three categories based on the level of genetic disruption. Nucleotide instability (NIN) is characterized by an increased frequency of base substitutions, deletions, and insertions of one or a few nucleotides; microsatellite instability (MIN or MSI) is the result of defects in mismatch repair genes which leads to the expansion and contraction of short nucleotide repeats called microsatellites; chromosomal instability (CIN) is the most prevalent form of genomic instability and leads to changes in both chromosome number and structure [12]. While instability is a characteristic of almost all human cancers, cancer genomes vary considerably in both the amount and type of genomic instability they harbor. Importantly, the instability phenotype has implications in patient prognosis as well as patient management, specifically with the choice of therapeutic agents [13][14][15].
Currently, detection of genome instability can be achieved using a variety of technologies, ranging from single-cell approaches to high-throughput multicellular techniques, each capable of detecting different levels of genomic changes. However, at present, no assay is capable of reliably measuring the rate (cell-to-cell variability) of small chromosomal changes such as deletions, amplification, and inversions within a population of cells. There is therefore a great need for sensitive, high-resolution techniques capable of detecting genomic instability over time as this would afford critical insights into the mechanisms that underlie genomic instability and the role of instability in tumorigenesis. In this review, we discuss the different levels of genomic instability and various methods of and limitations to detecting instability and describe global trends in genome instability across numerous tumor types.
2 Levels of genomic instability 2.1 Nucleotide instability NIN typically develops due to replication errors and impairment of the base excision repair and nucleotide excision repair pathways, leading to subtle sequence changes involving only one or a few nucleotides (substitutions, deletions, insertions, etc.) which can affect gene structure and/or expression (Fig. 1a). While less common than the other forms of genomic instability, when present, single nucleotide alterations can cause dramatic phenotypes. For example, inherited defects in these repair pathways (germline mutations in XPC, ERCC2, DDB2, and MYH) lead to disorders such as xeroderma pigmentosum and MYH-associated polyposis, which result in genomic instability and the accumulation of DNA mutations, consequently predisposing these individuals to skin and colon cancers, respectively [16,17]. Similar to the nuclear genome, the mitochondrial genome also displays NIN, and coupling of the high rate of reactive oxygen species generation with inefficient DNA repair can result in a rate of mtDNA mutations that is substantially higher than that of nuclear DNA [18,19].

Microsatellite instability
Microsatellites are repetitive DNA sequences comprising 1-6 bp located throughout the genome [20][21][22]. Within the population, microsatellite size is highly variable; however, each individual possesses unique microsatellites of a set length. MSI results from defects in DNA mismatch repair (MMR), specifically alterations of the MLH1, MSH2, MSH6, and PMS2 genes, which causes deletions or random insertion and expansion of microsatellites and a hypermutable phenotype (Fig. 1b). MSI is a characteristic feature of a number of cancers, including gastric, endometrial, ovarian, lung, and colorectal cancer (CRC), where it was first described and has been studied most extensively [23][24][25][26][27][28]. MSI occurs in approximately 15 % of CRC, which typically arise in the proximal colon, posses a normal karyotype, and are associated with a better prognosis than non-MSI tumors.

Chromosomal instability
CIN is an increase in the rate of gain or loss of segmental and whole chromosomes during cell division and is the most prominent form of genomic instability in solid tumors, with roughly 90 % of human cancers exhibiting chromosomal abnormalities and aneuploidy [3,39]. CIN tumors are characterized by global aneuploidy, amplifications, deletions, loss of heterozygosity (LOH), homozygous deletions,  (Fig. 2). These alterations lead to karyotypic instability and the simultaneous growth of diverse tumor subpopulations, resulting in genomic interand intra-tumor heterogeneity [39]. CIN develops early in tumorigenesis (detectable in premalignant lesions) and is associated with intrinsic multidrug resistance [40] and poor prognosis [15,41], making its detection clinically relevant.
Despite the prominence and fundamental importance of CIN to cancer biology, the molecular mechanisms underlying CIN in sporadic cancers remain poorly understood. This is due primarily to the fact that disruption of countless genes can give rise to CIN, including, but not limited to, those involved in chromosome condensation and segregation (STAG2) [42], telomere dysfunction (TRF1 and Tankyrase) [43], as well as DNA damage (ATM) [44,45] and spindle checkpoint genes (BUB1, Mad2) [46][47][48], highlighting the heterogeneous nature of CIN in sporadic cancers. Attempts to explain the presence and molecular basis of CIN in sporadic cancers have led to the development of three prevailing theories: the mutator hypothesis, the oncogeneinduced DNA damage model, and instability due to telomere erosion, which are reviewed in [5,10,49].
The advent of sequencing technologies has led to the recent discovery of an intriguing form of genome chaos and CIN, whereby only one or a few distinct chromosomes in a cancer cell are characterized by the presence of upwards of hundreds of complex genomic rearrangements [11]. These distinct chromosomal rearrangements were proposed by Stephens et al. to have developed through chromosome shattering ("thripsis" in Greek) or incomplete fragmentation and the inaccurate stitching together of chromosomes in a single stochastic event in a process termed "chromothripsis," an event in contrast to the widely accepted notion of gradual accumulation of cancer genome rearrangements. Chromothripsis has been proposed to occur in ∼2-3 % of a wide spectrum of cancers (with a higher incidence in bone cancers), where chromosome-specific massive rearrangements have been described [11,50]. The mechanisms underlying chromothripsis, and its clinical implications, have been recently reviewed by Forment et al. [51].

Interplay between instability types
While all levels of instability can co-occur within the same cell, and work in concert to disrupt a single gene, protein complex, or pathway, in colorectal and endometrial cancers, an inverse relationship between CIN and MIN has been observed [52,53]. Although both types of instability appear to occur early in tumor development and increase with tumor progression, cancers with an MMR deficiency tend to be diploid and exhibit normal rates of gross chromosomal changes, whereas MMR-proficient tumors are typically aneuploid and display increased rates of chromosomal alterations [12]. Moreover, the fusion of MIN and CIN cells results in CIN, but not MIN, suggesting that CIN is a dominant phenotype that may result from gain-of-function alterations rather than gene inactivation [3,46].

Methods for the detection and analysis of genome instability
A number of established strategies exist to detect genomic instability in cancer. However, it is important to keep in mind that genomic instability is a matter of rate of chromosomal alterations and is therefore a gauge of variability in chromosomal state between individual cells within a tumor [54]. To accurately assess instability, repeated measurements across cell populations throughout tumor evolution or, ideally, measurements in individual cancer cells are required to define the actual rate or variability in genomic changes for a particular tumor [54]. Although these measurements are more easily obtainable for cancer cell lines, measuring genome stability accurately in clinical tumor specimens where material is often limited and substantial cellular heterogeneity exists is considerably more difficult. As a result, few studies have determined the actual rate of chromosomal alterations in different cancer types and, thus, characterized true genomic instability [54]. Because of the difficulty in measuring actual genomic instability, various methods to calculate the frequency and extent of genomic changes for static tumor cell populations have been used as a surrogate to describe genomic instability. Therefore, caution must be taken when interpreting claims about instability in cancer. Since genomic instability occurs across multiple genetic levels, any method capable of detecting chromosomal, microsatellite, or nucleotide changes is adequate to measure a component of genomic instability. Such methods include, but are not limited to, karyotyping, flow cytometry, single nucleotide polymorphism (SNP) arrays, genome sequencing, and polymerase chain reaction (PCR), which are summarized in Table 1.

Single-cell approaches
Karyotyping is the visualization of a cell's entire complement of chromosomes, or karyotype. Assessment of a cell karyotype enables the identification of abnormalities in chromosome number (aneuploidy) and large structural rearrangements like inversions and translocations [55,56]. Traditionally, metaphase chromosomes are stained with a DNA-binding dye, such as Giemsa stain, which is taken up readily by gene-poor A,T-rich genomic regions and results in a chromosome-specific banding pattern that can be used to differentiate chromosomes and identify abnormalities. The use of multicolored fluorescence in situ hybridization (FISH) probes has greatly facilitated the assessment of CIN and is referred to as spectral karyotyping (SKY). The SKY technique results in coloring, or painting, of each chromosome with a different colored fluorophore, readily enabling the identification of chromosomes and rearrangements [55,57]. Although excellent for detecting global CIN changes, even the most advanced FISH strategies cannot accurately measure somatic mutations throughout the genome. While karyotyping is one of the few techniques available that enable the identification of alterations within a single cell, and the only one capable of profiling both clonal and non-clonal chromosomal alterations [58], like most other methods, it offers only a static picture of the state of chromosomal alterations with no information regarding the extent of variability between cells. Furthermore, it is labor-intensive and metaphase spreads from even short-term cultures can acquire culturing artifacts that induce additional genomic changes. Despite these limitations, karyotyping remains the most reliable method to detect non-clonal chromosomal aberrations and assess genomic variability among cells.
Advances in next-generation sequencing and wholegenome amplification technologies have enabled the advent of single-cell sequencing, which offers promising insight into understanding genomic instability as it provides not only a comprehensive look at the state of genomic alterations of a tumor cell but also cell-to-cell heterogeneity. Because single-cell sequencing relies on gene amplification, sequence bias and adequate genome coverage remain major challenges. However, new amplification methodologies such as multiple annealing and looping-based amplification cycles, which enable over 90 % genome coverage and can accurately detect mutations and copy number variations [59], are in development and have the potential to greatly improve single-cell sequencing. Although many obstacles remain before single-cell sequencing can be routinely implemented as a standard procedure for detecting genome instability, it has the ability to provide an unprecedented view of genomic instability.

Multicellular approaches
Flow cytometry, which measures cells in suspension as they pass through a laser, scatter light, and emit fluorescence, can be used to approximate cellular aneuploidy. This strategy estimates cell ploidy based on DNA content (which correlates to the intensity of fluorescence) and the stage of cells in the cell cycle. Comparison of the estimated ploidy in the G0/G1 fraction of malignant and normal cells allows a gross estimate of genome instability in cancer cells [60,61]. While flow cytometry is extremely accurate in its ability to estimate ploidy, it provides no information regarding NIN, MSI, or the segmental or whole-chromosome aberration components of CIN.
Array comparative genomic hybridization (aCGH) offers the ability to quantitatively detect and visualize whole and segmental chromosomal alterations such as gains, losses, amplifications, and LOH [62,63]. Briefly, reference genomic DNA and test DNA are differentially labeled, pooled, and hybridized onto arrays comprising BAC, cDNA, or oligonucleotides, and imbalances are visualized as differences in fluorescence intensity. The advent of SNP arrays offered improved resolution, enabling more precise mapping of copy number alterations and the detection of uniparental disomy (copy neutral loss of heterozygosity) as well as the ability to distinguish alleles at specific polymorphic sites [64][65][66]. However, neither aCGH nor SNP arrays are able to detect translocations, inversions, or somatic mutations.
PCR is the gold standard for detecting MSI. PCR is used to amplify known microsatellite regions, and the lengths of the short tandem repeats (PCR products) are compared in tumor and normal DNA to determine the state of MSI [37,67,68]. This approach is therefore limited to assessing MSI. PCR is also used routinely for the analysis of mitochondrial instability. The ability to isolate mtDNA from total DNA using mitochondrial-specific primers rather than through centrifugation not only reduced tissue requirements but also enabled the use of archival paraffin-embedded tissues, greatly expanding the number samples available for analysis [19]. Commonly used markers of mitochondrial genome instability detected by PCR and followed by direct sequencing include point mutations, insertions, deletions, and length changes in homopolymeric or dimeric nucleotide tracts. Competitive PCR, in which a competitor DNA fragment is added to the DNA sample, can be used to determine mitochondrial DNA copy number by determining the ratio between the intensities of the control and the sample PCR product band [18].
While not routinely performed at the single-cell level, whole-genome sequencing is arguably the most comprehensive and informative method of profiling the cancer genome. In a single experiment, sequencing is capable of identifying nucleotide substitutions, insertions or deletions, and larger genomic rearrangements such as copy number changes, inversions, and translocations, simultaneously capturing all levels of genomic instability for a given population of tumor cells (Fig. 3) [69,70]. The detection and extent of somatic mutations is determined informatically using computational programs for variant calling. These programs compare the sequences of both the tumor and patient-matched normal sample to a reference genome to reveal somatic and germline alterations, providing confidence calls for each mutation [69,71]. Copy number analysis by sequencing (both by high or low coverage) offers substantial benefits over array-based methods, including higher resolution (down to a single base) and precise delineation of breakpoints [72,73]. Copy number ratios at each genomic locus are estimated by counting and comparing the number of reads in both tumor and normal samples. Furthermore, whole-genome sequencing provides data on non-coding regions (promoters, enhances, introns, and non-coding RNA) as well as un-annotated regions, requiring no a priori knowledge of genome sequence, facilitating the discovery of novel DNA sequences.
Sequencing studies have provided massive amounts of data on cancer genomes, revealing great diversity in the mutation frequency across tumor types and identifying novel rearrangements in epithelial cancers. As data from sequencing studies continue to emerge in the public domain, a large-scale pan-cancer comparison of genomic instability in different cancers will be feasible. Such an analysis may shed more light on the mechanistic differences of cancer development in different tissues, which itself will improve our understanding of cancer biology and our ability to develop rationally designed therapies. The interpretation of whole-genome sequencing data in the context of heterogeneous tumors, however, remains a considerable challenge to the application of such data to patient care.
The fundamental limitation of these multicellular approaches is that they provide only a snapshot of the state of alterations in a tumor sample and are incapable of defining the rate of chromosomal changes within a tumor-two features that define genomic instability. While single-cell approaches such as karyotyping or single-cell array-CGH allow for unbiased comparisons of variability in chromosomal alterations between cells, they are not amenable to automation and are therefore time-consuming and laborintensive. Collection of repeated tumor biopsy samples and advances in single-cell profiling technologies will help generate more accurate metrics of genomic instability.

Pan-cancer trends in CIN
It is well established that vast genome instability exists at different levels and to different extents in various tumor types. In the last decade, several large-scale sequencing studies have been undertaken in an attempt to characterize recurrent alterations in cancer genomes [74][75][76][77][78][79]. While thousands of mutations have been identified, these studies have shown that very few genes are recurrently mutated, deleted, or amplified at high frequencies within a tumor type. Of the handful of recurrently altered genes, TP53 is the most frequently altered gene in all tumor types, while the others (CDKN2A, PTEN, EGFR, and RAS) have roles in regulating growth and encode classical tumor suppressors and oncogenes [74,76,80,81].
In general, epithelial tumors are thought to be more genomically unstable than hematologic and mesenchymal malignancies, in which a high proportion of cases are characterized by specific genetic rearrangements such as translocations [82]. Interestingly, certain cancer types display characteristic instability phenotypes. For instance, BRCAassociated breast and ovarian cancers demonstrate high levels of CIN, whereas lung cancer in smokers and never smokers differs in the extent of segmental alterations and subsequently, genome instability [83][84][85][86]. Moreover, specific subtypes of breast, ovarian, and lung cancers exhibit distinct patterns of alterations; the basal-like subtype of breast cancers (typically estrogen receptor-negative) have greater CIN than luminal subtypes, while type II highgrade serous ovarian carcinomas have greater CIN than type I serous ovarian cancers [87,88]. In lung cancer, adenocarcinoma and squamous cell carcinoma demonstrate distinct patters of genomic alterations, and within lung adenocarcinoma, the magnoid subtype displays higher CIN than other adenocarcinoma subtypes [89,90]. A review of genome sequencing studies revealed that epithelial-derived cancers such as breast, non-small cell lung, small-cell lung, melanoma, and prostate cancers have a greater number of somatic mutations than blood cancers including acute myeloid leukemia [91], which could suggest that epithelial cancers have greater nucleotide instability. However, specific environmental exposures, such as tobacco smoke, can have specific signatures in terms of epigenetic and genetic alterations in tumors, making it difficult to determine whether the mutations detected arose from nucleotide instability Fig. 3 Information provided from whole-genome sequencing. a Legend depicting the genomic data (rearrangements, SNPs, LOH, lesser allele fraction, copy number, somatic mutations, and genes affected by these alterations) available following whole-genome sequencing. b Circos plot of a lung adenocarcinoma tumor from a never smoker referenced against the matched non-malignant tissue within a tumor or from carcinogen exposure [91]. As more cancer genome sequence data become publicly available, it will be interesting to determine whether specific cancer types exhibit a mutator phenotype and harbor greater nucleotide instability than others.
To compare CIN trends in a pan-cancer manner, we accessed and interrogated copy number data for a set of 2,201 tumor samples representing 24 cancer types made publically available by the Broad and Dana Farber Cancer Institutes (Table 2). Segmented tumor data were downloaded (http://www.broadinstitute.org/tumorscape/pages/ portalHome.jsf); any segment with a log 2 ratio exceeding ±0.1 was defined as a segmental alteration. Next, we calculated the fraction of each cancer genome encompassed by segmental alterations to determine the proportion of the genome altered (PGA) and summarized the PGA across the various malignancies (Table 2) [86]. Cancer cell lines were not included in our analysis.
Of the 18 cancer types with at least six representative samples, mesothelioma and small-cell lung cancer had the greatest PGA and average number of copy number alterations (CNAs), suggesting that they may be the most genomically unstable in the context of CIN (Fig. 4). These two cancers were followed by breast, ovarian, non-small cell lung cancer, and liver, all epithelial cancers. The number of CNAs was highly correlated with PGA as greater PGAs were associated with a greater number of CNAs (Pearson's correlation: r=0.77) across all tumor samples. Of note is that hematological malignancies including acute lymphoblastic leukemia, myelodysplasia, and myeloproliferative disorder harbored some of the lowest PGAs. Thus, these results suggest that CNAs are highly correlated with PGA and that CIN may be greater in epithelial tumors than in hematological cancers, consistent with previously reported trends ( Table 2 and Fig. 4).

Conclusion
Genomic instability occurs early in tumorigenesis, increasing the spontaneous mutation rate and enabling the acquisition of DNA alterations that promote the hallmarks of cancer, thereby driving tumor development. While the molecular basis of instability is well understood in hereditary cancers, where it is linked to mutations in DNA repair genes, the basis of instability in sporadic cancers remains poorly defined. This limited understanding is due both to the genomic heterogeneity in different tumor types as well as within individual tumors and a lack of methods capable of capturing both the state and rate of instability, which are required to determine the true measure of instability. Genome sequencing studies have provided a wealth of information regarding the state of instability in a variety of cancers, highlighting the diversity in both the types and amounts of instability observed in tumor genomes. As the amount of starting materials for whole-genome sequencing experiments continues to decrease, single-cell sequencing will become feasible for solid tumors; with this will come an expanded understanding of which mechanisms of genomic instability are selected for and precisely how specific patterns of instability support tumor growth in unique systems.
In combination with repeat biopsies and sequencing of multiple areas in a single tumor, detailed maps of how genomic instability changes over time will emerge, which can then be interpreted in the context of unique selective pressures in the tumor microenvironment (e.g., the immune Fig. 4 Pan-cancer trends in genome instability. a Average number of copy number alterations for cell lines from each cancer type. Error bars represent standard deviation. b Average percent of the genome altered for all cell lines within each tumor type. Error bars represent standard deviation system, chemotherapy) or correlated to specific clinical features (e.g., tumor progression). Genomic instability remains an important, yet poorly defined, mechanism by which tumors accelerate their own evolution and survival. At the same time, once uncovered, these same mechanisms will undoubtedly present to the researcher a host of novel therapeutic opportunities.