Introduction

Activated oncogenes are a key feature of cancer development from its earliest stages [1]. One of their major effects is the induction of DNA damage via replication stress (RS) [2]. Specifically, oncogene-induced DNA replication stress (OIRS) leads to the formation of DNA double-strand breaks (DSBs), due to replication forks (RFs) collapse, fueling genomic instability (GI) [2, 3]. In early precancerous lesions, the collapse of DNA RFs occurs preferentially at specific loci termed fragile sites (FSs) [4]. As a result, FSs exhibit breaks, gaps, and rearrangements, collectively termed FSs expression. This is due to the activation of pathways responsible for fork collapse resolution and completion of DNA replication, which involve recombinogenic processes and DNA DSBs production [2].

An important issue is whether the instability manifested at these sites has any wider biological impact on cancer development. As further discussed in the manuscript, although FSs are heterogeneous in their expression patterns, they possess unique features that make them vulnerable to structural destabilization under RS conditions. Are these regions simply prone to DNA damage due to their intrinsic characteristics, conferring only to GI? Sparse evidence indicates that FSs enclose genes and non-coding RNAs, like microRNAs (miRs), while their expression could be epigenetically modulated by histones, implying that they are regions of the genome of a higher organization level (Fig. 1) [59]. Important bioinformatic resources are currently available and can be exploited to define potential topological associations between CFSs and these elements. Notably, the miRbase is constantly expanding while the ENCODE project [10] has deposited information on a vast range of binding elements and genomic modifications, including histone marks (like H3K79me2, H3K9ac, H3K4me3, and H3K27ac) that have a prominent influence on the expression process of the genome. As the pattern of instability at FSs in human tumors is variable, suggesting that it also depends on the cell type, this further complicates the role of FSs in malignancy. Last but not least, if such sites contribute to cancer development, why have they not been evolutionary selected for elimination? Could there be a higher reason that makes them at the same time vulnerable “units” of the genome with potentially meaningful function? Attempting to address these questions, in the current work we conducted an extensive review on the nature of their heterogeneity that accounts for their preferential instability. Next, by applying bioinformatic tools on data from the latest miRbase and the ENCODE project, we reveal that these sites are enriched in various (coding and non-coding) elements, such as cancer-related genes, miRs, and binding elements, as well as specific variations in histone modifications (Fig. 1). Based on these findings, we propose that these sites may represent unique “functional” units of the genome that may have a complex role upon OIRS with implications both in normal cell survival and cancer progression.

Fig. 1
figure 1

CFSs are not only vulnerable structural domains but may also be functional units of the genome that are sensitive to replication stress. CFS’s stability is affected by replications stress (RS). During cancer development, they are affected from the earliest precancerous lesions due to oncogene-induced replication stress (OIRS). Breakage at CFSs (broken red rectangle) may not only confer to genomic instability (GI) (dashed black rectangle), but could also have wider biological implications by affecting elements located within them (question mark). DSBs DNA double-strand breaks

Heterogeneity of fragile sites

FSs have been assigned in two classes, defined as rare fragile sites (RFSs) and common fragile sites (CFSs). Rare FSs are mainly induced by folate deficiency, correspond to dinucleotide or trinucleotide repeats, usually CGGn, and are found in less than 5 % of the human population and in specific families [11]. Their fragility is due to expansions of the micro- or mini-satellites sequences that they contain and in some cases are responsible for inherited diseases [11]. Therefore, RFSs will not be further discussed in this work.

Historically, CFSs were recognized as recurrent hotspots of double-stranded DNA breaks in cultured lymphocytes from healthy individuals [12]. They are present in all individuals, are part of the normal chromosomes, but exhibit different frequencies of expression in a population (reviewed in [13, 14]). CFSs are typically vulnerable to extrinsic replication stress, most notably to aphidicolin (APH), an inhibitor of DNA polymerase α, δ and ε, but are otherwise quiescent under normal conditions. This observation has been gradually broadened to include breakage patterns resulting from various replication inhibitors, such as nucleotide analogs (5-azacytidine, bromodeoxyuridine) or antitumor antibiotics (distamycin) and RS resulting from folate deficiency. Dietary and environmental factors like caffeine, cigarette smoke, and hypoxia may also enhance FS expression [15]. Recently, the induction of stress during the early S phase in B lymphocytes by hydroxyurea has been found to provoke DNA damage in a distinct pattern, corresponding to a new class of “early replicating fragile sites” (ERFSs) [16]. They occur primarily in early replicating DNA, close to replication origins, and are mainly situated in actively transcribed gene clusters (coding regions) [17]. This contrasts with CFSs, like FRA3B, which are most sensitive during their replication in late S phase [18]. Nevertheless, ERFSs seem also to arise from RF collapse and are similarly sensitive to ATR inhibition and oncogene-induced stress (see A. Nussenzweig chapter in this issue). OIRS is expected to induce instability at both ERFS and CFS, as suggested in two independent studies [16, 19]. Recent reports using the phosphorylated form of histone H2AX, the γ-H2AX, as a marker of DSB induction showed that ERFS were enriched for H2AX and γ-H2AX, while CFSs and heterochromatin lacked both, also suggesting differential DNA damage response at these sites [20]. Notably, both ERFSs and CFSs are rich in CpG-rich regions [17, 19], implying that these classes of FSs may either share structural similarities or the exact classification of their members as early and late replicating ones may need further re-assessment. Surprisingly, telomeric regions also appear to exhibit fragility in a similar manner as CFSs upon replication stress, including APH treatment [21].

It has been shown that CFS expression patterns depend not only on culture conditions but also on cell type [22]. Although traditionally studied almost exclusively in lymphocytes, different CFSs have been observed in fibroblasts [22, 23], breast, and colon epithelial cell lines [24, 25] and erythroid cell lines [25]. Considerable overlap exists between experiments, but the relative frequency of CFS breaks varies significantly. For example, FRA3B is the most frequent fragile locus in lymphocytes, but does not seem to be fragile in epithelial cell lines [24, 26], possibly due to the plasticity of replication programs in different cell lineages or because of a putative “housekeeping” role of FHIT. The second most fragile site, FRA16D, is very frequently affected in epithelial breast cancer cell lines (20–25 %), but only occasionally in colon epithelial cells (~5 %). Clearly, a complete characterization of CFS breakage probabilities will require a panel of different cell types.

CFSs are found in different individuals and are conserved across different species, including the mouse, the rat, and many mammals [27, 28], and across kingdoms, such as in the yeast S. cerevisiae [29]. Evolutionary conservation could argue in favor of a meaningful function if CFSs are considered outliers compared with the overall fragility of the genome in general. Nevertheless, variation between individuals can be significant. In a study of 20 normal adults [30], only FRA3B and FRA16D were found to be fragile in all individuals, and only 42 % of CFSs (19 of 45 identified) were present in the majority of individuals. In the earliest studies, less than 20 CFSs would explain more than 80 % of gaps and breaks [12]. A similar distribution was found in a population study of Deer mice, where high-frequency CFSs constituted approximately 26 % of the population total breaks and 38 % of CFSs were only found in single individuals.

Fragility of CFS

The issue of fragility at CFSs is a matter of intensive investigation. CFSs replicate either late in S-phase or initiate replication in mid-S phase, but exhibit a significant delay in completing it. Under conditions of RS, they may remain unreplicated even during G2-phase and up to mitosis leading eventually to their instability (as discussed in next section and reviewed in [13, 14]). Several features responsible for their replication sensitivity have not been revealed until now. These include intrinsic structural characteristics, the presence and overlap with large genes, differences in replication features, and epigenetic modulation [13, 14].

At the structural level, CFSs have the propensity to form secondary non-B structures that interfere with the movement of the replication fork thus leading to its collapse and associated DNA breaks [31]. Specifically, at sequence level, CFS are enriched in long stretches of AT dinucleotide-rich repeats that may form stable secondary cruciform DNA structures inducing fork stalling during DNA replication and in general incomplete or delayed DNA replication [31, 32]. In an earlier study, we performed a whole-genome analysis of CFS sequences and observed that they are on average rich in GC and Alu sequences [19]. The Alu family is a family of short interspersed repetitive elements (SINE) of about 300 bp containing mid and terminal poly A-stretches [33]. Interestingly, these elements are the most abundant mobile elements, and thus potentially recombinogenic in the human genome, and are implicated in various inherited human diseases and in cancer [34].

CFSs have been associated with genes extending over long genomic regions (“large genes”) [5]. The FHIT gene in FRA3B and WWOX in FRA16D are striking examples, measuring approximately 1.5 and 1.1 Mb respectively, compared with a mean of 10–15 kb for protein coding genes. The PARK2 gene, at approximately 1.4 Mb is also associated with FRA6E and may be down-regulated in ovarian tumors [30]. Intriguingly, genes over 800 kb may be prone to form RNA:DNA hybrid loops (termed R-loops) at sites of replication–transcription collision [35]. R-loops are structures formed by the association of the nascent transcript with the DNA template strand leaving unpaired the complementary non-coding DNA strand. Replication of large genes is time consuming and exposes the replication machinery to a risk of collision with the transcriptional machinery. In such an occurrence, the elongating RNA polymerase is blocked, leading to increased R-loop formation at Pol II pause sites. As a result, collision events may induce CFS breakage and a consequent enhancement of genomic instability [35].

Recent data have shed new light on the dynamics of the replication process at CFSs, providing several new mechanistic aspects explaining their instability (reviewed in [13, 14]). In the first one, it was shown that stability of FRA16C is perturbed under RS, as RFs progress more slowly and stall upon accounting AT-rich regions within this site. While in the bulk genome dormant origins are activated to complete replication, these are not available within FRA16C leading to delayed replication and instability [14] (also see B. Kerem chapter in this issue). A second report studying the mechanistic fragility of FRA3B, showed that fork speed slowing and stalling is similar with respect to the bulk genome, even under RS [26] (also see M. Debatisse chapter in this issue). In this case, the inability to complete replication was attributed to a large 700-kb core within this site that was found to be poor in origins. To accomplish replication of this site, origins from a long distance, located in the flanking regions, are required to come in and cover its length. The density and timing of origin firing events in the flanking regions seem to dictate the timing of FRA3B replication completion, thus influencing its stability. In a third mechanistic model regarding the FRA6E site, both replication arrest and paucity of origin activation lead to RS sensitivity, providing a functional combination of the two previous models [36]. Interestingly, some CFSs like the FRA3B do not exhibit stably these replication features in each cell type, suggesting that CFSs may demonstrate different patterns of instability, which are tissue specific. This may also explain why in various malignancies distinct profiles of GI are observed at CFSs that characterize each type of cancer. It would be interesting in the future to define the replication behavior of each CFS according to the specific cell type. That would be helpful in defining and possibly predicting the precise patterns of GI that take place during cancer development.

The replication density and timing of the genome has been proposed to be highly flexible and epigenetically controlled rather than directed by specific sequence motifs [13]. Therefore, an epigenetic control of CFSs replication stability may also apply. In support of this is the observed H3K9/14 hypoacetylation pattern displayed by the six most expressed CFSs in lymphoblastoid cells [9]. This histone modification has been reported to be associated with chromatin compactness and increased breakage. Also, regions with evenly spaced nucleosomes, an unusual chromatin structure preferentially formed at promoters and regulatory binding sites, have also been observed in FRA3B [37].

Overall, it seems that there is no single mechanism that can explain the fragility of CFSs but rather a multitude. They depend on several characteristics, including structural properties of the FSs as well as dynamic features governing their replication that apply in a given cell type. Interestingly, they are not necessarily mutually exclusive and often can function in complementary ways [13, 14]. The only common shared aspect by all these mechanisms is that they can eventually lead to a mechanical breakage of CFSs.

Maintenance of fragile site integrity

Instability at FSs is a recognized signature of DNA damage induced by replication stress [2] and it is detected from the earliest premalignant stages [3, 4, 19]. Replication checkpoints are activated in response to the stress induced at CFSs [38]. Central to these checkpoints are the DNA damage response kinases, ATM and ATR, which respectively sense DNA double-strand breaks and RF integrity [38]. Specific targeting of these kinases in cellular models revealed that ATR disruption or hypomorphic mutations lead to chromosomal instability within CFS even under normal replication, a phenomenon that is aggravated after low doses of APH [39]. In addition, dual inhibition of ATM and ATR using caffeine has been found to significantly increase CFS breakage compared with ATR deficiency alone, denoting also a role for ATM and a possible interplay with ATR in CFS protection [40, 41]. Inactivation of several down-stream components of the ATR network like Chk1, HUS1, Claspin, and SMC1 revealed similar effects, although not as efficient as ATR loss (Table 1) [31, 4244].

Table 1 Factors involved in control of CFSs stability and genome integrity

These checkpoints are vital as they ensure that DNA is replicated and chromosomes are prepared for mitosis [38]. Nevertheless, CFSs exhibit an increased vulnerability to RS leading to the activation of repair mechanisms [31]. The frequently observed presence of sister chromatid exchanges (SCEs) at the majority of CFSs breaks after APH treatment suggests that homologous recombination (HR) plays a major role in response to DSBs induced under conditions of RS [45]. The FANCD2 component of the Fanconi anemia (FA) pathway has been shown to play a role not only in HR-dependent replication recovery, but also in regulating CFSs stability (Table 1) (also reviewed in [31]). Similarly, activation of BRCA1 and other DSB repair proteins like RAD51 have also been found to be vital for maintaining CFSs stability (Table 1) [46]. Apart from HR the non-homologous end join pathway is also essential for chromosomal stability at these sites [47]. Specifically, by knocking down Rad51, DNA-PKcs, or Ligase IV, a significantly increased expression of CFSs under RS has been demonstrated. Notably, MDC1 and γ-H2AX foci were formed and co-localized with those of Rad51 and DNA-PKcs, while γ-H2AX and phospho-DNA-PKcs foci localized at expressed FSs on metaphase chromosomes.

Other components implicated in resolving replication over specific CFSs region include specialized polymerases (Table 1) [48]. These polymerases, like DNA polymerase eta (Pol η), mainly deal with DNA synthesis of complex sequences, like repetitive and secondary structures that impede replication, by performing the so-called ‘by-pass’ function. Depletion of these specialized polymerases has been shown to lead to persistence of unreplicated CFSs in mitosis. Various helicases/translocases have been proposed to promote fork restart at CFSs in non-redundant ways, like the BLM, WRN, and RECQ1 through Holliday junction-mediated fork remodeling that is independent of DSB formation (Table 1) [41]. Their main purpose seems to be resetting of structural intermediates arising from HR as well as unwinding of DNA secondary structures, in order to facilitate replication fork restart. Alternatively, nucleases, such as the structural endonuclease MUS81-EME1 and the FA pathway nuclease SNM1B/APOLLO, are responsible for DSB-mediated fork restart and/or elimination of permanently collapsed forks by cleavage of replication intermediates and consequent DNA synthesis (Table 1) [41].

Many of the above-described factors have been shown to stabilize CFSs during S-phase replication. Nevertheless, recent observations have shown that under RS, non-fully replicated or interlinked DNA at CFSs may escape S and G2–M checkpoints and enter mitosis (reviewed in [13, 49, 50]). Attempts to segregate these intermediates lead to sister chromatid entanglement followed by non-disjunction, ultimately leading to formation of ultra-fine bridges (UFBs) in anaphase cells. UFBs are defined by FANCD2/FANCI FA proteins binding to their edges, while BLM and PICH (Plk1-interacting checkpoint helicase) attach along the bridge. These persistent replication intermediates have been shown to be processed by the MUS81-EME1 nuclease in early M-phase, possibly with the help of ERCC1, to provide a controlled production of DNA breaks, aiming to allow undisturbed disjunction of sister chromatids. In case of failure, UFBs are formed as mentioned during anaphase. At this stage, BLM helicase and PICH translocase assisted by topoisomerase IIIa (TOPIIIa) and the BLM-associated proteins RMI1/2 function as a second line of defense by decatenating these structures and permitting chromatid segregation [50]. If unresolved UFBs still persist, they will eventually lead to chromosome miss-segregation by uneven distribution of DNA between the daughter cells and micronuclei formation. The transmitted errors at CFSs will be shielded in 53BP1 nuclear bodies in the emerging G1-daughter cells and possibly replicated in the S-phase by high-fidelity polymerases. These results pose a new light on CFSs cleavage, and led to the proposal that apart from being detrimental in initiating genomic instability, it can also serve as a mechanism for controlled production of DNA breaks that rather maintain than compromise genome integrity.

Several questions though emerge from this model that has been established with extrinsic factors (chemical inhibitors) that induce RS. How does this model apply and/or differentiate pre-malignant cells that are known to undergo OIRS? Which of the two types of damage, MUS81-EME1 cleavage or the decatenation inability, confers to the cancer-associated genome instability? A tempting but speculative model is that during premalignant stages, MUS81-EME1 cleavage activity is probably aberrantly increased, leading to a high frequency of DSBs, particularly at CFSs. This could be mediated by the active oncogenes that are present at such stages [2] and which in turn increase the activity of CDKs that regulate MUS81-EME1 expression [50]. As long as the checkpoints and their p53 effector are intact, the damage is repaired or the antitumor barriers of apoptosis and senescence eliminate such cells [2]. A similar scenario regarding gross genome damage due to decatenation inability may apply. Notably, it has been recently shown that under moderate RS, CFSs breaks can escape from efficient ATR checkpoint surveillance, leading to mitotic tolerance of such aberrations [44]. Such a pool of cells could undergo selection for loss of checkpoint function(s) and eventually accumulate DNA damage, probably through both MUS81-EME1 processing and chromatid non-disjunctions, provided that they are compatible with survival. Eventually, progression to full malignancy will ensue. This scenario fully concurs with our previous findings showing a prevalence of CFSs breakage along with the presence of UFBs and micronuclei in U2OS cells experiencing OIRS due to sustained expression of the replication licensing factor (RLF) Cdt1 [51]. Importantly, clones of these cells that “escaped” from the antitumor barriers, after prolonged Cdt1 expression, acquired a highly invasive potential. In a paradoxical way, this recently described model of deliberate CFSs controlled breakage to protect genome integrity of cells, may apply to malignant cells in the sense that it allows them to survive at the expense of genome integrity. Further expanding on this model, an emerging question concerns the effect exerted from the CFSs’ instability on the various elements like genes and non-coding RNAs that are located within them.

Functional elements in fragile sites

Several major publications arising from the ENCODE project [10] have underlined the importance of non-coding DNA. Non-coding regions of the genome have been found to participate in biochemical reactions with regulatory potential, such as transcription factor binding, epigenetic modifications, or long-distance interactions. Numerous functions have been attributed to non-coding RNA, including the expanding and best understood family of microRNAs (miRs) that are involved in post-transcriptional regulation of messenger RNA [52]. As a result of this progress, CFSs content may be now understood at a finer scale and the implications of CFS breakage will have to be reexamined carefully.

Fragile sites and cancer-associated genes

While available data point to an overlap between genes and CFSs [5], there is only one report showing that CFSs are denser in protein coding genes, with their distribution among fragile versus non-fragile regions varying among the chromosomes [7]. At the same time, a systematic review for the density of genes present within these genome areas is not available. To address this question, we retrieved a list of 327 genes participating in pathways in cancer from the Kyoto Encyclopedia of Genes and Genomes (KEGG)Footnote 1 and investigated their association with both cytogenetically and molecularly mapped CFSs. We found that 110 cancer-related genes (33.6 % of all cancer-related genes) are located within CFSs (Table 2). Based on this, the density of cancer-related genes in a cytogenetically defined CFS compared to the rest of the genome is 37.2 % higher (Fig. 2a).

Table 2 Cytogenetic defined common fragile sites (CFSs) association with cancer genes and miRs
Fig. 2
figure 2figure 2

Frequency of cancer-related genes, repetitive elements, miRs, binding elements, and histone marks in CFSs. a CFSs exhibit a higher density of cancer-related genes (obtained from Kyoto Encyclopedia of Genes and Genomes), Alu repetitive elements [19], miRs, and the CTCF binding element relative to non-fragile regions. b CFSs exhibit a differential density of the histone marks (i) Histone 3 lysine 27 acetylation (H3K27ac) and (ii) Histone 3 lysine 4 trimethylation (H3K4me3), relative to non-fragile regions that is cell type origin-dependent (data concerning histone modifications derived from ChIP-seq experiments belonging to the ENCODE project were downloaded from the UCSC server (http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegMarkH3k4me3/, http://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/wgEncodeRegMarkH3k27ac/). Specifically, we obtained bigWig files for H3k4me3 and H3k27ac modifications in the GM12878, H1-hesc, HSMM, HUVEC, K562, NHEK, and NHLF cell types. Information concerning regions of interest was extracted with the bigWigSummary utility, also available from the UCSC server. Specifically, the average signal was calculated for every chromosome in every cell line by repeatedly invoking: bigWigSummary-type = mean “bigwigfile” chrN start end. Similarly, the average signal was calculated for cytogenetically and molecularly mapped fragile sites. For every defined fragile site, the mean histone modification signal of the corresponding chromosome was subtracted from the mean signal of the fragile region. Signal difference from mean for site i = mean(FS i ) − mean(chromosomeFSi ). The chromosome means varied within each cell type (data not shown), as did the histone modifications between cell types) (S histone signal). c Frequency of histone marks per CFSs. Each CFS exhibits a differential density of the histone marks. (i) Histone 3 lysine 4 trimethylation (H3K4me3) and (ii) Histone 3 lysine 27 acetylation (H3K27ac), relative to non-fragile regions averaged over all cell types presented in Fig. 2 (using the data generated for Fig. 2b, we also plotted a boxplot for individual cytogenetically defined CFSs in all 7 cell lines mentioned above with respect to H3K4me3 and H3K27ac. Significant heterogeneity between CFSs can be observed)

Fragile sites and microRNA genes

According to an early study, miRs are particularly frequent in CFSs [6]. Out of the 186 miRs known at the time, 35 were found in, or very close (<3 Mb) to, CFSs, occurring at a density (number of miRs per length) that was estimated to be approximately 9 % higher than in non-FSs. A newer analysis has found approximately 33.8 % of 715 miRs within CFSs, corresponding to a relative 50 % (26–85 %) higher density in regard to non-FSs, but the relation seems to vary between chromosomes [7] with some, like chromosome 16 and 19, having many more fragile than non-fragile miRAs, and others, like chromosome 14, having a lower incidence of miRNAs in fragile regions. Given that the number of known miRs has more than doubled since then, we have repeated this analysis with the recent version of miRBase (v20, [53]) and have found 686 miRs out of 1,871 (36.7 %) within cytogenetically defined fragile sites (Table 2) (corresponding incidence in molecularly mapped CFSs is shown in Table 3). Thus, the relative density of miR in cytogenetically defined CFSs is 57 % higher than in the rest of the genome (Fig. 2a; Table 2). Specific pertinent examples include tumor suppressors like hsa-mir-34a in FRA1A and oncomirs like hsa-mir-21 in FRA17B. In addition, more than 50 % of microRNAs seem to be clustered in relatively short regions, up to 50 kb, often containing multiple miR isoforms belonging to the same family [54]. When mapped, approximately 28 % of miR clusters overlap with known FSs. Rearrangements within these regions can disrupt multiple miRs in a single hit and produce complex phenotypic changes.

Table 3 A list of molecularly delimited common fragile sites (CFSs) obtained from a manual search of the literature

Fragile sites and regions with regulatory potential

Other DNA elements, such as regions with regulatory potential, may also be contained or overlap with CFSs. As an example, CTCF binding sites from ENCODE ChIP-seq are distributed throughout the genome. CTCF is a critical “weaver” of chromatin structure and function, and can provide an anchor point for nucleosome positioning [55]. Indirectly, CTCF can influence the accessibility of chromatin and plays diverse roles in chromatin insulation, gene regulation, imprinting, intra/interchromosomal interactions, nuclear compartmentalization, and alternative splicing [56, 57]. In some cases, distal fragments bound by CTCF have been found to mediate long-range interactions by loop formation and could modulate transcription at distant sites [58]. We examined the percentage of all potential CTCF binding sites within molecularly mapped CFSs and found that this ranges between 2.76 and 3.20 % in different cell lines (Table 4). Relative to the total length of fragile segments, this corresponds to an 18 % (10–25 %)-fold increase in the number of potential CTCF binding sites (Fig. 2a). Although it is impossible to know whether CTCF binding at these sites exerts a meaningful effect, its presence seems in accordance with the observation that many CFSs are gene-rich. Even more, CFSs rearrangements could influence gene expression further away on the same chromosome.

Table 4 Data from ENCODE [10] with respect to molecularly mapped common fragile sites (CFSs) from Table 3

Fragile sites and histone modifications

Epigenetic modifications, such as histone methylation and acetylation, can also take place within FS. It appears that H3K9/14 hypoacetylation is a global feature of CFSs in a lymphoblastoid cell line [9] and could impede replication progression. Several enzymes modify key histone residues with relatively high specificity and regulate, indirectly, transcription, repair, and replication. Indeed, histone modifications vary significantly between cell types and are well correlated with transcription levels in the ENCODE data (Figure 2 in [10]), especially for H3K79me2, H3K9ac, H3K4me3, and H3K27ac. Histone 3 lysine 27 acetylation (H3K27ac) co-localizes with active enhancers [59] and regions with open chromatin structure and could play a role in protecting against replication–transcription collisions and R-loop formation. Intriguingly, H3K27ac varies between cell types and CFSs (Fig. 2b, c). The average ChIP-seq acetylation signal and the number of signal peaks (data not shown) within cytogenetic CFSs are slightly lower than the corresponding chromosome mean in K562 cells, but higher in HUVEC cells and equivocal for other cell types. Such variability may explain the plasticity of CFSs and their differential expression between individuals, cell types, and culture conditions.

Histone 3 lysine 4 trimethylation is also associated with active promoters [60] and correlates well with transcription in the ENCODE data [10]. It appears that H3K4me3 may also contribute to the DNA damage response and repair of DSBs in yeast cells [61], mediating cellular responses to genotoxic stresses, and interacting with the human tumor suppressor ING1, which is required for DNA repair and apoptotic activities [62]. A recent study of ERFSs [9] has shown that replication stall, as identified by anti-replication protein A ChIP, preferentially co-localizes with H3K4me3 (see supplemental figure 1 in [16]). Although cytogenetically defined CFSs as a whole do not show a large deviation from the mean, some sites in particular, like FRA3B and FRA16D, seem to be on average poor in H3K4me3 while others, like FRA2E, FRA3C, and FRA7D seem to be on average rich in H3K4me3 (Fig. 2c). When the cell lines employed were grouped according to their origin (cancerous versus embryonic versus normal epithelial versus normal mesenchymal), a pattern regarding the density of the H3K27ac and H3K4me3 within CFSs relative to non-fragile sites could be discerned (Fig. 2b). Cancer and embryonic cells (K562, GM12878, H1-hESC) displayed lower signals of H3K27ac and H3K4me3 relative to the non-fragile regions, whereas a significantly different distribution was noticed in the other cell line groups (p < 0.001, ANOVA). Despite the small number of cell lines examined, a possible functional link between histone modifications and the other elements (genes, non-coding RNAs, regulatory sequences) positioned within the CFS cannot be excluded (Fig. 1). This potential interplay may be even more complex during carcinogenesis. Oncogenes may distress this functional cross-talk by altering the epigenome of a particular region. As an example, oncogenic Cdc6 was shown to act as “molecular switch” at certain tumor-suppressor loci by regulating CTCF binding. The latter led to suppression of the genes encoded and simultaneous firing of adjacent dormant origins. If such a scenario takes place within CFS that are rich in CTCF sites (Fig. 2a), and depending on the cellular context, the density and timing of firing origins can be altered affecting replication dynamics [56, 57].

Our current understanding of CFSs is traditionally based on a static mapping, often cytogenetic and imprecise, which cannot fully capture the interaction of non-coding DNA, regulatory elements, and histone modifications with vulnerability to RS. Even though the current CFS mapping successfully predicts response to extrinsic stress and OIRS, a more accurate model of fragility will eventually have to integrate experimental data at the nucleotide resolution with other non-coding elements. This is an intriguing area for further study.

Fragile sites in carcinogenesis

Extending the concept of CFSs in carcinogenesis has been a subject of active research since the discovery of an association between cancer breakpoints and FSs [63]. Overall, it appears that CFSs are generally sensitive to innate RS occurring naturally in various tumors and cell lines [64]. Multiple clusters of homozygous deletions, usually small, have been detected over known CFSs in an exhaustive survey of cancer genomes but their expression profile is variable [65]. For example, FRA2F, FRA3B, FRA4F, FRA5H, and FRA16D were most affected while others, like FRA2B and FRA4B, were least affected. Recurrent alterations have been identified in FRA3B and FRA16D in several cancer types, leading to further investigation of the FHIT and WWOX genes, respectively, in mouse models (see K. Huebner and R. Aqeilan chapters in this issue). Fragile sites FRA10C and FRA10G may be involved in the formation of the oncogenic RET/PTC rearrangement in papillary thyroid carcinoma [15]. Specifically, in RET/PTC1, the FRA10G-localized RET is rearranged with the FRA10C-localized tumor suppressor gene CCDC6, while in RET/PTC3 it is rearranged with NCO4 that is located in FRA10G. In a similar manner, the MYC oncogene is flanked by CFSs FRA8C and FRA8D, that may facilitate adjacent integration of HPV18 [66] or MYC amplification [67]. Viral DNA integration in the genome of a host cell can lead to cancer development and CFSs provide preferential hotspots for this [68]. Particularly, HPV16 E6 and E7 oncogenic products have been shown to induce replication stress and DSBs in the host cell. This occurs preferentially at CFSs allowing viral genome integration at these sites [68].

Despite the abundance of CFS breaks in cancer, it would be inappropriate to assume that alterations of genes residing within CFSs always confer a clonal advantage in cancer development [65] without evidence of selection or at least convincing causative models. Clearly, breakage probability (passenger alterations) as a consequence of fragility and clonal selection (driver alterations) in cancer development are two separate phenomena that should not be confused.

Nevertheless, the impact of CFS instability in cancer should not be easily dismissed or oversimplified. CFSs breakpoints have been detected in preneoplastic lesions in human and mouse models [4, 19] well before the emergence of the malignant phenotype. Briefly, exposure of xenografted normal human skin to growth factors preferentially induces CFS instability. Similarly, hyperplastic mouse urothelium from HRAS transgenic mice showed numerous copy number alterations in fragile areas. CFS instability is an early manifestation and can be attributed to experimentally controlled, oncogene-induced stress in these studies, in a way that more closely resembles carcinogenesis than APH-induced stress. Therefore, it could be argued that CFS alterations are frequent in cancer, as described above [65], not just because of a higher breakage probability but also because of an earlier involvement, even before the complete deregulation of the cellular machinery.

Furthermore, any double-strand break can have dire consequences, such as the initiation of a breakage-fusion-bridge cycle, especially when subtelomeric and peri-centromeric CFSs are disrupted simultaneously [69]. Through this mechanism, CFS breaks can amplify oncogenes, delete tumor suppressors or, most importantly, initiate persistent chromosomal instability. Massive accumulation of localized chromosomal rearrangements in a single time-point, termed chromothripsis (literally: shattering of the chromosome) and chromoanasynthesis, has recently been identified in several cancer types [70]. Indirect evidence suggests that CFSs may have an important role in this process [71] by stalling the RF, favoring RF collapse and, in extreme cases, chromosome pulverization leading to clustering of chromosomal breaks [13]. Indeed, chromosome fragmentation distal to the CFS has been observed under the microscope in some cases [71] and could be a triggering factor for chromothripsis. On the other hand, multi-step, recurrent CFS alterations could be difficult to discriminate from single-step rearrangements, rendering the identification of chromothripsis even more difficult. In that scenario, CFS stability and localization is an important parameter in the bioinformatic algorithms that are applied to define and model such cancer rearrangements. In addition, CFSs can contribute to the clustered shuttering of the chromosomes also during the process of premature chromatin condensation (PCC) [72, 73] in which interphase chromosomes or late replicating chromosome zones like CFS or extranuclear bodies micronuclei are ‘induced’ to condensation by various mitotic factors [74]. This reveals more possibilities for CFSs to act as contributors to GI through chromosome breakage. An interesting scenario suggested that the G2–M mammalian checkpoint can fail to delay mitotic onset as it may not be sensitive enough to detect a few remaining long-replicating forks, thus allowing chromatin condensation of late replicating CFS regions, resulting in multiple DNA breaks [26, 44].

CFSs as “functional” units: a new perception

Common fragile sites have long been considered vulnerable breakage sites in the genome in response to RS from extrinsic factors. Their fragility has also been associated with GI in cancer development. As we have previously shown, CFSs are preferentially affected from the earliest precancerous lesions, in response to OIRS [2, 4]. In the current work, we first performed a review on the heterogeneity and fragility mechanisms affecting these sites. Next, by applying bioinformatic tools and exploiting available information in various databases, like the KEGG, miRbase, and ENCODE, we show a prevalence of various cancer-related genes, miRs, binding elements, and histone modifications in CFSs (Figs. 1, 3). The presence of such a wide spectrum of coding and non-coding elements changes the view on CFSs content and their nature itself. Given that CFSs are altered from the earliest stages in cancer, their impact on cancer development may be more profound than simply participating in the emergence of GI. On one hand, cancer-related genes and miRs may be affected from such early precancerous stages, therefore possibly exerting a strong pressure for malignant progression (Fig. 3). On the other hand, this pressure is also reinforced by alterations and imbalances in the binding elements and histone patterns, respectively, in the CFSs. Furthermore, collectively, all these alterations may further affect in an “avalanche” mode not only the stability of the CFSs, but overall of the genome (Figs. 1, 3). Therefore, as the anti-tumor barriers are gradually overwhelmed, this avalanche effect may function in a positive feedback mode to promote cancer. An important question that emerges is why CFSs are not selected for elimination from the genome, but are rather conserved features in mammals? A tempting but speculative answer is that by locating a set of important coding and non-coding elements in regions that replicate late and/or with delay and thus are prone to instability, they may function as alarm sensors scattered throughout the genome in various chromosomes, to signify detrimental effects from the RS on the cell. As long as the mammalian checkpoints and repair mechanisms are not compromised, cells can monitor and protect their genome and functional integrity through such a dynamic interaction. Nevertheless, this imposes the risk that if the checkpoints and the anti-tumor barriers gradually fail, tumor promotion ensues (Fig. 3). As we were able to examine only a small subset of binding elements and histone modifications from the ENCODE and the miRbase is constantly expanding, in the future more in-depth studies are required to obtain a comprehensive picture of CFSs and on their role in cancer. Overall, CFSs may not be merely structural domains vulnerable only to breakage but highly organized “functional” units that may have deeper biological consequences for the cell when affected.

Fig. 3
figure 3

Model proposing that CFS apart from contributing to GI exert wider biological effects during cancer development. CFSs are preferentially affected from the earliest precancerous lesions, in response to OIRS, conferring to GI. A wide spectrum of coding and non-coding elements are present within CFSs. Cancer-related genes and miRs may be affected from such early precancerous stages, therefore possibly exerting a strong pressure for malignant progression. This pressure is also reinforced by alterations and imbalances in the binding elements and histone patterns, respectively, in the CFSs. Furthermore, collectively, all of these alterations may further affect in an “avalanche” mode not only the stability of the CFSs, but overall of the genome. As the anti-tumor barriers are gradually overwhelmed, this avalanche effect may function in a positive feedback mode to promote cancer