Introduction

Deoxyribonucleic acid (DNA) is the macromolecule that stores the genetic information in nearly all living organisms. It is composed of two antiparallel strands coiled around each other in the form of a right-handed helix. For cells to transmit this genetic information to their progeny, DNA needs to be accurately duplicated during the S phase of the cell division cycle, and the resulting sister chromatids must be segregated faithfully to the daughter cells in mitosis. However, the same DNA template is also a substrate for the transcription machinery to make the various different forms of RNA. Transcription has the potential to interfere with DNA replication, because some genes in human cells are so large that their transcription can take a whole cell cycle to complete during which time  replication of the same locus has to occur. Moreover, some genes, such as those encoding ribosomal RNAs, must be continuously transcribed to provide a ready supply of ribosomes for protein synthesis. As a result, some regions of the genome have to conduct DNA replication and transcription at the same time, inevitably increasing the probability of a damaging conflict.

DNA replication is initiated at multiple replication origins in eukaryotic cells. During the G1 phase, replication origins are licensed through the recruitment of the origin recognition complex (ORC), which, in turn, recruits Cdc6 and Cdt1, and, subsequently, the non-activated form of the replicative DNA helicase, MCM2–7, to form the so-called pre-replication complex (pre-RC) (Boos et al. 2013). Each licensed origin contains a head-to-head assembled, double-hexameric MCM2–7 complex. Once cells have entered S phase, replication firing factors, including Cdc45, the GINS complex, and MCM2–7 to establish the Cdc45∙MCM2–7∙GINS (CMG) complex (the CMG helicase). Other replication factors (including Mcm10 and Ctf4) are known to facilitate the activation of the CMG helicase and the recruitment of the replicative DNA polymerases (Thu and Bielinsky 2013; Zhu et al. 2007). The CMG complex then initiates DNA unwinding to trigger DNA synthesis (Bleichert et al. 2017). Once a DNA origin is opened to create a replication bubble, two replication forks are established that travel in opposite directions as replication proceeds.

In common with replication, transcription begins at specific genomic regions. To initiate transcription in eukaryotic cells, general transcription factors first bind to a gene promoter, which assists in recruitment of RNA polymerase (RNAP). RNAP must be activated to catalyze the process of transcription through both transcription factor II Human (TFIIH)-mediated DNA denaturation and RNAP phosphorylation. Together with numerous transcription elongation and RNA processing factors, RNAP then catalyzes transcription elongation (Bentley 2014). Unlike replication, transcription does not occur in isolation, and it is always coupled with downstream events required for RNA processing (e.g. gene splicing). Therefore, the large RNAP complex and its associated RNA processing factors present a real challenge for any advancing replication fork that might be encountered.

Since DNA replication and transcription can occur on the same stretch of template DNA at the same time, the multi-enzyme machineries required to catalyze these two processes have the potential to collide with each other. While this might appear to be a catastrophic event for a cell, in most cases, cells can proliferate efficiently despite this potential threat. Therefore, cells must possess defined strategies to orchestrate these two processes to prevent or minimize collisions. Nevertheless, mounting evidence indicates that replication and transcription collisions do occur within certain genomic loci, and that these conflicts are a natural source of genomic instability (Garcia-Muse and Aguilera 2016; Lin and Pasero 2012). In this review, we discuss the strategies employed by cells to coordinate replication and transcription and to limit damaging collisions. We will also discuss the pathways involved in resolving replication-transcription collisions that do occur in the human genome, particularly those arising at specific loci that are deemed to be ‘difficult-to-replicate’, such as fragile sites, the ribosomal DNA (rDNA) genes, and telomeres.

Strategies for coordinating replication and transcription

In bacteria, collisions between the replication and transcription machineries are unavoidable, because the genome is being replicated and transcribed constantly during favorable growth conditions. Gene transcription in bacteria such as E. coli is biased in the genome, such that most genes are transcribed and replicated in the same direction (to avoid head-on collisions). Nevertheless, the replisome travels on the DNA template much faster than does RNAP, and therefore there are inevitable co-directional collisions where the replisome encounters the transcription machinery from behind (Helmrich et al. 2013). In the case of eukaryotic cells, where the genome is much larger, and both replication and transcription can start at multiple loci simultaneously, strategies have evolved to coordinate DNA replication and transcription in a more sophisticated manner to avoid collisions between these two processes.

Spatial separation of replication and transcription

About 20 years ago, analyses in mouse and human fibroblasts showed that sites of DNA replication and transcription are organized within distinct territories throughout the whole period of S phase. This suggests that replication and transcription are spatially separated to avoid the machineries occupying the same section of the genomic template at the same time (Wei et al. 1998) (Fig. 1a). Moreover, transcription-dependent gene looping has the potential to insulate transcription machineries from replication forks, thus limiting collisions (Bermejo et al. 2007). Consistent with this proposal, DNA topoisomerase II is essential both for gene looping and for avoiding replication and transcription collisions (Bermejo et al. 2012; Li et al. 1999).

Fig. 1
figure 1

Strategies employed by eukaryotic cells to coordinate replication and transcription. a Sites of DNA replication and transcription occur in separate domains in either early or late S phase, particularly in the nucleolar area where the rDNA loci reside. b Timing of transcription and replication is highly regulated. Most of the transcription occurs before S phase and fades in S phase. In mammalian cells, the transition from G1 to S phase is controlled by the Rb/E2F complex. In the absence of Rb, E2F1 binds to the promoters of target genes encoding proteins involved in DNA replication (e.g., DNA polymerases, thymidine kinase, dihydrofolate reductase, and CDC6), and chromosomal replication (e.g., replication origin-binding proteins ORC1 and MCM5). As a result, it facilitates the G1/S transition and S-phase progress. When cells are not proliferating, E2F is bound to Rb protein contributing transcriptional suppression. c In eukaryotic cells, there is a bias towards co-orientation of replication and transcription. Top panel, in areas where highly transcribed or large genes reside, replication initiation sites frequently co-localize with transcription regulatory sites including enhancers or promoters. Bottom panel, in the regions where genes are highly transcribed, i.e., rDNA regions, sections of specific DNA sequences are bound by proteins (e.g., Fob1 or Sap1 in yeasts) that can serve as barriers for the replication fork coming from the opposite direction to that of transcription

Temporal separation of replication and transcription

It has been demonstrated that more than 95% of the sites of ongoing replication do not overlap with sites of transcription in early S phase (Wei et al. 1998), suggesting that replication and transcription can also be separated in a temporal manner (Fig. 1b). Indeed, in the heavily transcribed rDNA regions where transcription and replication cannot be easily divided spatially, replication timing varies in the different units of these highly repetitive regions. For example, it has been shown that rDNA regions containing actively transcribed units replicate in early S phase, while those regions that do not have active units replicate late in S phase (Dimitrova 2011). More generally, nascent RNA sequencing analysis has confirmed that the timing of DNA replication and that of transcription differ globally in human cells (Meryet-Figuiere et al. 2014). The molecular mechanisms underlying the regulation of replication timing remain to be fully elucidated, although several advances have been made in the past decade. For example, the high-resolution chromosome conformation capture technique has revealed that the formation of topologically associated domains (clusters of self-interacting regions) is vital for minimizing replication and transcription conflicts (Marchal et al. 2019; Pope et al. 2014).

It would be predicted from the above discussion that an alteration in replication timing would provoke collisions between the replication and transcription machineries leading to genome instability (Almeida et al. 2018; Herlihy and de Bruin 2017). Indeed, many proteins identified as playing a role in controlling replication timing (either locally or globally) are also involved in the cellular response to DNA damage generated during S phase (Yamazaki et al. 2013). For example, in fission yeast, the telomere-binding protein, Taz1, binds to late replication regions to regulate replication timing (Tazumi et al. 2012), and is required for the prevention and repair of DNA breaks at these loci (Foti et al. 2016). In budding yeast, Fhk1 and Fhk2 control replication initiation timing globally through influencing origin clustering (Knott et al. 2012), and their function is closely related to cell-cycle regulation, cellular stress responses, and aging (Murakami et al. 2010). In mammal cells, RIF1 is an important, non-replisome-associated, protein that plays a genome-wide role in regulation of replication-timing (Yamazaki et al. 2013). This is apparently achieved through its combined ability to modulate the interaction between different replicating domains in G1 cells, and to bind to late-replicating regions to prevent unscheduled origin firing (Cornacchia et al. 2012; Foti et al. 2016; Yamazaki et al. 2012).

Besides replication timing, transcription patterns are also strictly regulated throughout the cell cycle, especially in readiness for the transition from G1 to S phase (Wittenberg and Reed 2005). It is well established that transcription of genes required for S phase is tightly controlled by the E2F family of transcription factors (Dimova and Dyson 2005). This wave of transcription starts in early G1 phase, peaks at the G1/S boundary, and then fades during the early stages of S phase (Bertoli et al. 2013). One of the proposed biological reasons behind this is to prevent the interference between replication and transcription (Fig. 1b). Consistent with this idea, overexpression of oncogenes, such as those encoding MYC or Cyclin E, can lead to increased E2F activity and premature S phase entry. As a consequence, the global G1 gene transcription program fails to be completed before S phase entry, leading to a high level of transcription throughout S phase (Bertoli et al. 2016). Through examining nascent DNA synthesis in early S phase under conditions of Cyclin E or MYC overexpression in human cells, unscheduled DNA synthesis from origins located in intragenic regions has been observed (Macheret and Halazonetis 2018). These intragenic origins are normally disabled by transcription during G1 (Macheret and Halazonetis 2018), which suggests that their inappropriate firing was caused by premature S phase entry, because transcription did not have sufficient time to displace the pre-replicative complexes from these origins. As a result, conflicts between replication and transcription would occur when these origins fire, potentially leading to DNA double-strand break formation and the subsequent chromosomal rearrangements (Macheret and Halazonetis 2018).

Co-orientation of replication and transcription

In a situation where replication and transcription must occur at the same time in a particular genomic locus, cells try to avoid conflicts between these two machineries by creating a bias in favor of co-directional movement of the replication and transcription machineries. In bacterial cells, it is well established that highly transcribed rRNA, tRNA, and other essential genes are almost exclusively co-directional with replication fork progression (Guy and Roten 2004; Rocha and Danchin 2003). In eukaryotic cells, several features of genome organization are utilized to provide a similar co-directional bias. Replication initiation sites are frequently found within transcription regulatory units, including enhancers and promoters. For example, replication origins in S. cerevisiae are recognized by transcription factors (Herrgård et al. 2006; Miotto et al. 2016). In mammals, although no such consensus elements are evident, replication origins tend to overlap with enhancers and promoters (Cadoret et al. 2008; Miotto et al. 2016; Sequeira-Mendes et al. 2009). Recently, cis elements involved in controlling replication timing have been identified in mouse embryonic stem cells (Sima et al. 2019). These elements often possess the features of enhancers or promoters, enforcing the notion that replication and transcription are coupled. In addition, numerous lines of evidence indicate that transcriptional initiation activity defines those replication origins that will fire efficiently (Candelli et al. 2018; Chen et al. 2019; Sequeira-Mendes et al. 2009). This, in turn, enables replication and transcription to occur in a co-directional fashion. Consistent with this, using an Okazaki fragment sequencing technique in human RPE-1 cells, Chen et al. showed that replication origins preferentially fire near RNAPII-occupied transcription start sites of highly transcribed genes, creating a strong bias towards co-directional replication and transcription (Chen et al. 2019) (Fig. 1c).

Another strategy to impose bias is to use the so-called ‘replication fork barriers’, which are found in many eukaryotic cells, and enable co-directional replication and transcription at the highly transcribed rDNA loci (Gerber et al. 1997; Little et al. 1993). These replication fork barriers are composed of defined DNA sequences and are tightly bound by non-nucleosomal proteins (e.g., Fob1 in S. cerevisiae and Sap1 in Schizosaccharomyces pombe). They are located at the 3′ termini of rDNA transcriptional units to ensure that any replication forks progressing in the opposite direction to the direction of transcription will be stalled before colliding with RNA polymerase I (Muller et al. 2000; Pasero et al. 2002) (Fig. 1c). Similar pausing sites have been found at tRNA regions in yeast, and are required to stop replication forks only if they are progressing in the opposite direction to that of tRNA transcription (Deshpande and Newlon 1996; Ivessa et al. 2003).

Consequences of conflicts between replication and transcription

Despite the strategies adopted by cells to prevent the DNA replication and transcription machineries colliding with each other, it is inevitable that such collisions occur occasionally. It is generally believed that co-directional encounters are less detrimental to cells. Indeed, in vitro, it was shown that a reconstituted E. coli replisome can displace or bypass a co-directional RNAP, as well as to use the RNA as a primer to conduct DNA synthesis under some circumstances (Liu et al. 1993; Pomerantz and O’Donnell 2008). Whether this mechanism exists in living bacteria (or in eukaryotic cells) remains unclear. However, even co-directional encounters are not always benign. For example, using a well-defined episomal replication system in human cells, Hamperl et al. showed that co-directional conflicts promote an ATM-dependent DNA-damage response, suggesting that these conflicts lead to DNA damage (Hamperl et al. 2017).

Understandably, head-on encounters are considered to be more deleterious to fork progression and are widely associated with the creation of DNA damage (Deshpande and Newlon 1996; Merrikh et al. 2012; Prado and Aguilera 2005). The negative consequences of a head-on encounter go beyond those created by a direct collision between the replication and transcription machineries. For example, when the replication and transcription machineries converge, DNA torsional stress in the form of positive supercoiling accumulates in the intervening region, inhibiting helix unwinding and thus preventing continued replication fork progression (Bermejo et al. 2012) (Fig. 2). Indeed, in both yeast and human cells, DNA topoisomerases I and II associate with replication forks, and are required to prevent replication and transcription interference (Bermejo et al. 2012; Tuduri et al. 2009). Head-on encounters can also induce pathological R-loop formation involving the nascent RNA transcript (Fig. 2). R-loops are three-stranded nucleic acid structures containing one DNA–RNA hybrid and a displaced single strand of DNA. These structures can directly block DNA replication, leading to fork stalling or collapse, and can induce a DNA-damage response (Aguilera and García-Muse 2012; Santos-Pereira and Aguilera 2015). Consistent with this, R-loops have been shown to be significantly enriched at sites where head-on encounters are prevalent (Hamperl et al. 2017).

Fig. 2
figure 2

Summary of the pathways involved in preventing or resolving replication and transcription conflicts in eukaryotic cells in S phase. When conflicts between replication and transcription are unavoidable, several pathways are available to lessen the impact of or resolve the conflicts. First, the newly synthesized RNAs are coated with RNA processing proteins to prevent R-loop formation. These proteins are referred as messenger ribonucleoprotein complexes (mRNPs, e.g., the THO complex and splicing factors ASF/SF2). Second, R loops can be resolved by either RNaseH or helicases such as SETX (Senataxin) or DHX9. In some cases, the displaced DNA strand might form a G-quadruplex (G4) that would stabilize the R-loop structure. Third, stalled RNAP blocking the replisome can be removed by either the transcription-coupled nucleotide repair pathway (TC-NER), Def1-mediated, proteasome-dependent degradation, or via the Dicer-mediated digestion of the newly synthesized RNA and the subsequent degradation of RNAP. Finally, anti-backing of RNAP facilitated by TFIIS and RECQL5 in human cells may also prevent potential collisions between transcription and replication. With the above measures, the conflicts between replication and transcription could be resolved even in the case of a head-on collision. When, however, the conflicts cannot be resolved, the DNA between the two machineries might become positively supercoiled, while DNA behind the transcription becomes negatively supercoiled, which would increase the chance of R-loop formation and subsequent genome instability

Pathways for resolving replication–transcription conflicts

Fortunately, when a collision between the replication and transcription machineries does occur, cells are equipped with efficient surveillance and repair pathways to deal with the wide range of atypical and/or undesirable structures that might be formed. These include DNA secondary structures, such as G-quadruplexes and hairpins, R-loops, damage to the DNA template (e.g. collapsed replication forks), as well as backtracked or stalled RNAPs. Indeed, genome-wide analyses have indicated that transcribed genes are natural pausing sites for DNA replication (Azvolinsky et al. 2009), indicating that cells must have efficient means to overcome these impediments.

R-loop prevention and resolution

As discussed above, one of the by-products of transcription are R-loops, which are very stable in vitro and in vivo. R-loops can arrest both DNA replication forks and RNA polymerases (Aguilera and García-Muse 2012; Belotserkovskii and Hanawalt 2011; Belotserkovskii et al. 2010, 2017). A current model for how co-transcriptional R-loops are formed is the ‘thread-back’ model, where the newly synthesized RNA can invade the transiently opened template DNA (Aguilera and García-Muse 2012). One of the main strategies to prevent co-transcriptional R-loop formation is the coupling of transcription to mRNA processing and export. The newly synthesized RNA is immediately bound by proteins, including the THO complex and splicing factors ASF/SF2, which prevents the nascent RNA from hybridizing to the template DNA (Aguilera and García-Muse 2012; Huertas and Aguilera 2003; Li and Manley 2005) (Fig. 2). Another strategy is to prevent the accumulation of negative supercoiling behind the advancing RNAP, which would, otherwise, facilitate helix opening and, hence, RNA/DNA annealing. This function is dependent on DNA topoisomerase I, whose role in limiting R-loop formation is well documented (Drolet et al. 1995; El Hage et al. 2010; Tuduri et al. 2009).

Cells have also developed multiple strategies to resolve R-loops and, hence, minimize their deleterious effects. First, the RNA component can be degraded. In yeast and human cells, ribonuclease H1 (RNase H1) and ribonuclease H2 (RNase H2) can specifically degrade the RNA strand of a DNA/RNA hybrid. In human cells, RNase H1 seems more important in the context of transcription-associated R-loops, and has a genome-wide role in resolving R-loops. However, a recent study demonstrated that RNase H2 plays an essential role in R-loop processing and ribonucleotide excision repair in the G2 and M phases in yeast (Lockhart et al. 2019). RNase H1, however, can function independently of the cell cycle to remove R-loops and appears to become activated in response to high R-loop loads in S phase. It would be interesting to determine if this is also the case in human cells. Second, the R-loop can be resolved after it has encountered a replication fork. The fork stalling generated under these circumstances actives the Fanconi anemia–BRCA pathway to remove R-loops, ensuring the complete duplication of genome in S phase (García-Rubio et al. 2015; Madireddy et al. 2016; Schwab et al. 2015). Third, certain helicases have been identified that specifically disassemble R loops at specific loci, including Senataxin (SETX) and DHX9. The RNA/DNA helicase SETX is one of the best characterized R-loop-binding factors. Mutations in the SETX gene have been found in two neurodegenerative disorders: ataxia with oculomotor apraxia type 2 (AOA2) and amyotrophic lateral sclerosis type 4 (ALS4) (Chen et al. 2004; Moreira et al. 2004). It was demonstrated that BRCA1 can recruit SETX to the termination regions of gene transcription where R-loop formation is known to occur naturally, and this prevents the accumulation of single-strand DNA breaks (SSBs) at those regions (Hatchi et al. 2015). Most interestingly, when SETX is conjugated to SUMO2/3, it can resolve transcription–replication encounters by interacting with the 3–5′ exonuclease exosome complex via the EXOSC9/RRP45 subunit (Richard et al. 2013) (Fig. 2).

RNAP complex clearance and prevention of backtracking

In E. coli, the actions of two DNA repair proteins, Mfd and UvrD, can displace a stalled RNAP by pulling the RNAP back from the stall site (Epshtein et al. 2014; Ganesan et al. 2012). Similarly, in eukaryotes, the transcription-coupled nucleotide excision repair pathway helps to remove RNAP (Vermeulen and Fousteri 2013). If such strategies fail, persistent RNAP complexes can be ubiquitylated and degraded through the proteasome-dependent pathway (Wilson et al. 2013) or released using components of the RNA interference pathway (Castel et al. 2014; Zaratiegui et al. 2011) (Fig. 2).

In addition, RNAP oscillates between productive and backtracked states at numerous DNA positions. ‘RNAP backtracking’ refers to the state where RNAP reverses on the template by one or more nucleotides, and is essential for the maintenance of transcriptional fidelity (Artsimovitch and Landick 2000; Nudler 2012). Deep sequencing of 3′ ends of nascent transcripts associated with RNAPII in yeast has indicated that 75% of DNA polymerase pausing sites are associated with RNAP backtracking (Churchman and Weissman 2011). In contrast to this physiological RNAP backtracking, unscheduled backtracking can lead to DNA damage and is proposed to be responsible for the formation of at least some of the double-strand DNA breaks generated during co-directional encounters (Dutta et al. 2011) (Fig. 2). In general, RNAP backtracking events are rarely seen during the elongation phase of transcription. However, when encountering obstacles, such as damaged DNA or stably bound proteins, RNAP backtracking can occur during transcription elongation. In this situation, RNAP may lose contact with the end of the nascent transcript leading to an arrested RNAP that will eventually be encountered by the replication machinery (Gomez-Herreros et al. 2012).

Luckily, anti-backtracking processes exist to prevent any deleterious consequence caused by RNAP backtracking. In prokaryotes, transcription can be directly coupled with translation due to the lack of a nuclear membrane (Proshkin et al. 2010). In addition, some transcription factors, including TFIIS in eukaryotes (Thomas et al. 1998), can assist transcription elongation by reducing nucleotide mis-incorporation, which is the main cause of RNAP backtracking, thereby preventing the replication and transcription collisions. Moreover, backtracked RNAP can be reactivated by specific factors, including GreA and GreB in bacteria (Opalka et al. 2003) and TFIIS in eukaryotes (Cheung and Cramer 2011). These factors can cleave the transcript to produce a new 3′-OH end, such that transcription can resume. Interestingly, human RECQL5 is also proposed to prevent replication and transcription conflicts by restricting RNAP backtracking (Saponaro et al. 2014; Urban et al. 2016) (Fig. 2).

Hotspots of replication and transcription conflicts

Although cells possess numerous mechanisms to prevent or resolve replication and transcription conflicts, certain genomic regions seemingly encounter more conflicts due to the nature of their DNA sequence or genomic location, or their association with transcribed genes and transcription factor binding sites (i.e., enhancers and promoters). As a result, these regions are deemed to be hard-to-replicate and are prone to be unstable.

Fragile sites

Chromosome fragile sites are genomic regions displaying visible breaks or gaps (fragility) on metaphase chromosomes (Sutherland 1979) following replication perturbation (often referred to as replication stress; RS). Based on their frequency in the population, these loci are classified as either common fragile sites (CFSs), which are present in all individuals, or rare fragile sites (RFSs), which exist in less than 5% of individuals (Kremer et al. 1991; Sutherland et al. 1998). To date, over 200 CFSs (Mrasek et al. 2010) and 33 RFS (Sutherland 2003) have been identified in human cells.

CFSs are hard-to-replicate and are hotspots for genomic alterations found in human cancers. The molecular pathway underlying CFS-associated genomic instability and the reason why CFSs are hypersensitive to RS are still not fully understood. Several features of CFSs may underline their instability. First, some CFSs are naturally late replicating. FRA3B was the first CFS shown to be late replicating during S phase using fluorescence in situ hybridization (FISH) (Le Beau et al. 1998). Through labeling late-replicating DNA with BrdU in human cells, Wang et al. showed that that RS-induced breaks or gaps preferentially occurred on the allele of FRA3B that replicated later. Similar effects were observed at other CFSs studied, including FRA16D, FRA7H, FRA7G, FRA1H, and FRA2G (Palakodeti et al. 2004; Pelliccia et al. 2008). These studies provided direct evidence that allele-specific late replication is involved in the fragility of many of the most prominent CFS in human cells (Wang et al. 1999). This replication-timing analysis indicated that these CFSs either start to replicate very late or the progression of the replication machinery is unusually slow or interrupted in S phase.

CFSs are also considered to be associated with replication origin-poor loci. A study on FRA3B in lymphoblastoid cells showed that replication fork speed within FRA3B is similar to that in the bulk genome, and instead, it is the lack of replication initiation events that contribute to its fragility (Letessier et al. 2011). This might explain to some extent the known tissue specificity of CFS expression. It is known that the replication program is reset during differentiation, which leads to a defined, tissue-specific, replication timing profile (Dazy et al. 2006; Hansen et al. 2010; Hiratani et al. 2008). Indeed, FRA3B, which has a paucity of replication initiation events in lymphoblastoid cells, and is known to be highly fragile under RS in this cell type, but not in fibroblasts. Similar results have been observed with FRA16D and FRA6E (Letessier et al. 2011; Palumbo et al. 2010).

Another feature of CFSs is that DNA secondary structures can form within these regions. Early studies demonstrated that some CFSs (FRA2G, FRA3B, FRAXB, FRA7E, FRA7H, FRA8C, and FRA16D) are enriched with A/T rich sequences (e.g., AT microsatellites) that tend to adopt an abnormal DNA structure with a very high degree of flexibility and instability (Arlt et al. 2002; Boldog et al. 1997; Mishmar et al. 1998; Ohta et al. 1996). It is plausible that secondary structures located within CFSs could then perturb DNA replication fork progression. In support of this, transfer of a short AT-rich region (Flex1) derived from FRA16D into the yeast genome can cause replication stalling and DNA breakage at this region (Zhang and Freudenreich 2007). Also, in mammalian cells, CFS-derived AT-rich DNA is a hotspot for mitotic recombination (Wang et al. 2014). In addition, proteins known to counteract DNA secondary structures, such as DNA helicases (e.g., BLM) and structure-specific endonucleases (e.g., MUS81), are reported to be crucial for CFS stability, which supports the notion that DNA secondary structures might play a crucial role in the fragility of at least some CFSs (Lu et al. 2013; Pirzio et al. 2008; Shah et al. 2010; Ying et al. 2013).

Despite the above discussion, it is hard to envisage how any particular sequence feature could explain the tissue-specific nature of CFS fragility. By comparing CFSs between two different cell types, < 20% of the sites of CFS fragility are shared (Le Tallec et al. 2013). Hence, the propensity of a given CFS to be fragile varies greatly between two different cell types. For example, FRA16D is among the most fragile site in human lymphocytes, but shows much lower levels of fragility in fibroblasts (Helmrich et al. 2006; Hosseini et al. 2013; Le Tallec et al. 2011, 2013).

One interesting feature of CFSs is that they frequently co-localize with large genes, with some as large as 1–2 mega base pairs (Mb) (Smith et al. 2007). For example, the two most frequently studied CFSs, FRA16D and FRA3B, are associated with the very long WWOX (1.1 Mb) and FHIT (1.5 Mb) genes, respectively (Huang et al. 2015; Ohta et al. 1996). Indeed, a large scale CFS mapping study in epithelial and erythroid cells indicated that more than 80% of human CFSs, and 100% of CFSs in mouse embryonic fibroblasts, harbor genes over 300 kb in size, more than five times the median gene size (Le Tallec et al. 2013). It has also been reported that actively transcribed large genes are highly associated with CFS expression, as well as being hotspots for copy-number variation (CNV) (Wilson et al. 2015). As noted above, the collisions between transcription and replication that would be expected to occur more frequently in long genes can induce R-loop formation that is deleterious to replication fork progression, increasing the probability of the CFS being fragile. Consistent with this being a factor, overexpression of RNase H1, which removes R-loops, can reduce CFS expression (Helmrich et al. 2011). Moreover, the CFS marker, FANCD2, which was shown to bind to the center of large actively transcribed genes under RS (Pentzold et al. 2018), is required for resolving R-loops formed within CFSs (García-Rubio et al. 2015; Madireddy et al. 2016; Schwab et al. 2015). Because transcription is constantly reprogrammed during the cell development, and some tissue-specific transcription components are known to regulate the change of gene expression during development (Hiller et al. 2001), transcription of large genes harboring CFSs could readily explain the tissue-specific nature of CFS expression.

Rare fragile sites (RFSs) exist in less than 5% of the individuals. RFSs often contain di- or tri-nucleotide repeat sequences that are unstable and prone to expand and contract. They are sub-divided into folate-sensitive and non-folate-sensitive groups. Currently, there are 24 known folate-sensitive RFSs, all of which are associated with the expansion of a CGG-repeat sequence (Durkin and Glover 2007). To date, FRAXA is the best studied folate-sensitive RFS in humans because of its association with defined pathological disorders (Kidd et al. 2014). The expansion of the CGG-repeat element within the 5′ untranslated region (5′UTR) of the human FMR1 gene at FRAXA is responsible for disorders that arise through different mechanisms: FMR1 gene silencing gives rise to fragile X syndrome (requiring > 200 CGG repeats), whereas toxic RNA stabilization induces an ovarian insufficiency disorder or a neurological disorder associated with ataxia where there are between 50 and 200 CGG repeats (Bell et al. 1991; Hansen et al. 1992; Santoro et al. 2012; Sherman 2000). Using DNA:RNA immunoprecipitation (DRIP) of genomic DNA from cells with either normal (∼ 30 CGGs) or pre-mutation FRAXA alleles (55-200 CGGs), Loomis et al. reported that transcription through the GC-rich FMR1 5′UTR region favors R-loop formation. It is plausible that, in those cells with either normal or pre-mutation FRAXA alleles, the formation of R-loops (and possibly other DNA secondary structures) at the transcribed FMR1 gene could generate an obstacle for the DNA replication machinery. This might then lead to the activation of a group of DNA-damage proteins, which could either promote replication re-start or inadvertently trigger the abnormal expansion of the CGG repeats during attempted repair (Loomis et al. 2014). Further research is clearly warranted in this direction. Interestingly, by analyzing a cell line with a mutant FRAXA allele (> 500 CGGs), Groh et al. demonstrated that R-loops act as an initial trigger to promote FMR1 gene silencing, possibly by their association with the repressive H3K9me2 chromatin mark (Groh et al. 2014).

There is also a group of fragile sites that are designated as being ‘early replicating’ (called ‘early replicating fragile sites’; ERFSs) (Barlow et al. 2013). These are fragile sites that tend to overlap with highly transcribed, early replicating gene clusters. In addition, ERFSs often lie close to replication origins, which might increase the probability of replication–transcription collisions (Barlow et al. 2013; Mortusewicz et al. 2013). Importantly, like CFSs, the fragility of ERFSs can also be stimulated by oncogene overexpression. It is thought that transcription occurring in the G1 phase serves to inactivate replication origins within the body of many transcribed genes, thus reducing the potential for conflicts (Lemmens et al. 2018). Foreshortening of G1 caused by oncogene overexpression, therefore, leaves insufficient time for transcription of genes, and, subsequently, increases the chances of replication–transcription conflicts (Halazonetis et al. 2008; Macheret and Halazonetis 2018).

Highly transcribed regions

In human cells, rRNA and tRNA genes are highly transcribed by RNAPI or RNAPIII, respectively, to meet the demand for protein synthesis. Therefore, the chances of the replication and transcription machineries colliding with each other should be particularly high at these loci (Deshpande and Newlon 1996; Kobayashi 2014; Takeuchi et al. 2003). As mentioned earlier, however, replication fork barriers are a feature of rDNA and tDNA loci to limit head-on collisions. Nonetheless, collisions associated with R loops are seen commonly at rDNA/tDNA genes (Chen et al. 2017; Lindström et al. 2018). Any deleterious consequences of such collisions are seemingly limited by the action of RNase H1, PIF1, and TOP1 (El Hage et al. 2010, 2014; Shen et al. 2017; Tran et al. 2017).

Telomeres

Telomeres are the repetitive DNA sequences located at the ends of chromosomes. Telomeres also exhibit fragility, as shown by the discontinuity or loss of signals when probing for telomeric DNA using FISH (Özer et al. 2018). Telomere fragility is also exacerbated by a low dose of APH, in the manner used for induction of CFS fragility, supporting the notion that telomeres are hard-to-replicate and should be viewed as a class of specialized fragile sites (Sfeir et al. 2009). Although telomeres are largely heterochromatic, transcription can be initiated in the sub-telomeric region to generate a transcript called TERRA (telomeric repeat-containing RNA). TERRA can form stable R-loops with telomeric DNA (Azzalin et al. 2007), and indeed, these have been observed in many eukaryotic cell types (Rippe and Luke 2015). As discussed above, R-loops are challenging obstacles for replication forks and among the major sources of RS. On top of that, the displaced, G-rich single-stranded DNA formed during TERRA R-loop formation favors the stabilization of G-quadruplexes. Thus, TERRA R loops may be a major source of RS at telomeres. The key enzyme in resolving R-loops, RNase H1, is known to regulate telomere length in telomerase-negative ALT cells that express elevated levels of TERRA (Arora et al. 2014). The FEN1 flap endonuclease is also known to process DNA–RNA hybrids to avoid telomere fragility (Teasley et al. 2015), while loss of the UPF1 helicase leads to TERRA accumulation and impaired telomeric leading stand DNA synthesis (Azzalin et al. 2007). Intriguingly, in human telomerase-positive cells and in yeast cells, TERRA abundance is cell-cycle-dependent, with the lowest level in the S and G2 phases (Flynn et al. 2015). This is probably one of the strategies cells have evolved to avoid a collision with the replication fork within telomeres.

Concluding remarks

DNA replication and RNA transcription are two processes that are essential for cell proliferation. Ironically, because these two processes are competing for the same template DNA, conflicts between them are inevitable. From bacteria to mammals, all cells have adopted strategies to minimize these conflicts or resolve them faithfully if they happen to occur. Recent lines of evidence indicate that replication–transcription conflicts represent perhaps the major source of intrinsic genomic instability, which has the potential to drive human disease. Therefore, it is crucial to decipher the processes underlying the cause of these conflicts, and the mechanisms used to ameliorate conflicts when they occur.

In the past two decades, one of the most significant advances in this field is the recognition of the importance of R-loops in driving RS and genome instability (Drolet et al. 1995). If these structures are not resolved in a timely manner, they can form obstacles for DNA replication, and present serious problems for genome maintenance. These advances in R-loop biology have been greatly facilitated by the development of an antibody (S9.6) that can bind to R-loop structures (Boguslawski et al. 1986). This permits detecting  R-loops directly in cells using immunofluorescence (Sollier et al. 2014), although there are caveats about its specificity in this context, and the identification of genomic regions that form R-loops using DRIP coupled with high-throughput sequencing (Halász et al. 2017). Such analysis has also permitted the identification not only of the epigenetic changes in the vicinity of R-loops (Chen et al. 2015; Ginno et al. 2013; Sanz et al. 2016), but also, when coupled to mass spectrometry, of the proteins bound to R-loops (Cristini et al. 2018). These technological advances will also greatly facilitate research in the coming years to address the following questions (amongst many others): (1) How do R-loops contribute positively to physiological processes such as regulation of gene expression and the initiation of DSB repair (Ohle et al. 2016)? Do these putative functions occur mainly through the recruitment of R-loop-binding proteins? (2) In cancer cells where there is an elevated frequency of transcription–replication conflicts at specific hotspots like CFSs, which helicases/nucleases are employed to promote R-loop resolution? (3) When R-loops cannot be resolved, how is a DNA-damage response initiated, and what are the downstream consequences of such a response? Given the interest in the basic biology of DNA replication and transcription, and the emerging role played by replication–transcription in human disease, we do not anticipate any diminution in activity in this field in the coming years.