Introduction

Humans start their life as a single cell that has to repeatedly divide to create the ~ 40 trillion cells that comprise the human body (Bianconi et al. 2013). It is essential that all the genetic information contained in the zygote is reliably transmitted to all daughter cells to guarantee proper development. At each cell division, DNA replication involves the activation of tens of thousands of replication origins to ensure complete genome duplication. This program must be very robust and be able to adjust to the gene transcription programs and the chromatin organization of the different cell types that they are becoming. Furthermore, a huge level of DNA replication and cell division is necessary during the entire human life span to replace old, dead or damaged cells. If DNA replication fails, genome integrity is challenged and many diseases, such as cancers and neurological disorders, can arise (Ganier et al. 2019; Zeman and Cimprich 2014). It is therefore essential that this program is correctly accomplished. However, during the life span, large numbers of exogenous and endogenous replication stresses routinely challenge DNA integrity and lead to genome instability. In particular, growing evidence indicates that gene transcription itself is an important, yet unavoidable endogenous replication stress, which can either suppress replication initiation, or can generate conflicts with the DNA replication process. In this review, we focus on transcription-mediated replication stress and its impact on human diseases. First, we describe the mechanisms of DNA replication initiation and control, as well as its relation to gene transcription. We then discuss the different mechanisms by which transcription acts as a notable source of replication stress to induce genome instability. Finally, we explain how such transcription-mediated replication stresses are involved in various human diseases.

Origin licensing and firing: a two-step process

The nuclear genome must be correctly duplicated once and only once per mitotic cell division. To avoid genome re-replication, DNA replication is temporarily divided into two steps: (i) the origin licensing that takes place between mitosis and the beginning of the next interface, where all the possible replication origins are recognized and loaded with the pre-replication complex (pre-RC), and (ii) the origin firing that takes place during S phase (Fig. 1). Although most of the collected information comes from yeast, the major process seems to be highly conserved in other eukaryotes. In this section, we report mainly dynamics from budding yeast and integrate with information from other eukaryotes.

Fig. 1
figure 1

Origin licensing and firing occur over a two-step process. The first step of DNA replication, called origin licensing, consists of loading the pre-replication (pre-RC) complex onto chromatin on all the potential replication origins along the genome. This occurs between the end of mitosis and G1 phase. The second part of the process takes place in S phase, where replication origins are activated through the recruitment of limiting factors that lead to the conversion of pre-RC to pre-IC (pre-initiation complex). This transition is regulated by the replication timing program that marks the order of replication of the genome, with origins in the early-replicating regions fired before those in the late-replicating regions. To avoid re-replication, components of the pre-RC are segregated, exported or degraded, therefore impairing re-licensing

The loading of the pre-RCs onto chromatin starts with the recognition of origins by a hetero-hexamer called origin recognition complex (ORC, ORC1-6) (Bell and Stillman 1992), which further recruits other factors to form the pre-RCs. In yeast, replication origins are associated with specific sequences called autonomously replicating sequences (ARS) (Marahrens and Stillman 1992), while in higher eukaryotes the situation is less clear and replication origins are not defined by a specific sequence. Multiple techniques have been used to map replication origins and the results are not concordant, suggesting that we might be looking at different subsets of origins based on the limitations of the various approaches (see Prioleau and MacAlpine 2016; Ganier et al. 2019 for review). Origins identified with the small nascent strand (SNS) method seem to be enriched at transcriptional start sites (TSSs) (Sequeira-Mendes et al. 2009; Cadoret et al. 2008), origin G-rich repeated elements (OGRE) (Cayrou et al. 2012), G quadruplex (G4) (Besnard et al. 2012), high CpG and GC content regions (Cayrou et al. 2011; Delgado et al. 1998; Cadoret et al. 2008), while the OK-seq (Okazaki fragment sequencing) method has shown that origins preferentially position within the intergenic regions before and/or after gene bodies that are AT rich (Tubbs et al. 2018; Petryk et al. 2016) (Fig. 2).

Fig. 2
figure 2

Origin distributions at early and late-replicating regions. Origin distribution differs between distinct regions of the genome. Within the early-replicating genome (Left panel), replication origins are enriched in intragenic regions between active genes. This effect might be because active transcription can cause disassembly of the pre-RC or sliding of the MCM complex away from the original loading position due to the passage of the RNA polymerase over transcribed genes. In the late-replicating regions (Right panel), which are frequently associated with regions that lack gene transcription, replication origins are almost randomly distributed

Replication origins are also marked by epigenetic signatures, such as H2A.Z and H4k20me2/3, the presence of which is needed for ORC1 recruitment (Beck et al. 2012; Long et al. 2020; Kuo et al. 2012). ORC binding is followed by the recruitment of cell division control protein 6 (CDC6), which stabilizes ORC binding (Speck and Stillman 2007), and CDC10-dependent transcript 1 (CDT1). CDT1 is loaded onto chromatin together with the mini-chromosome maintenance (MCM2-7) helicase complex through the interaction with ORC (Evrin et al. 2009; Maiorano et al. 2000; Nishitani et al. 2000). A second MCM complex is then loaded by ORC in an inverted orientation to form MCM double-hexamer formation (Miller et al. 2019). This process completes origin ‘’licensing” (Fig. 1). The pre-RC is not stable on chromatin, and recent studies in yeast and Drosophila have suggested that gene expression can alter origin licensing by disassembling the pre-RC or by sliding the MCM complex away from the original loading position due to the passage of the RNA polymerase over transcribed genes (Gros et al. 2015; Powell et al. 2015). This process might explain the preferential localization of replication initiation sites within intergenic regions between active genes.

To avoid re-replication when cells enter S phase, components of the pre-RC are made inaccessible through post-translational modifications that can cause their inactivation, export out of the nucleus and degradation, or as in the case of CDT1, segregation via an interaction with GEMININ (Ballabeni et al. 2013; Petersen et al. 2000, 1999; Nguyen et al. 2000; Li and DePamphilis 2002; Méndez et al. 2002). At the transition between G1 and S phase, fully formed pre-RCs are phosphorylated at specific sites by Dbf4-dependent kinase (DDK) and cyclin-dependent kinase (CDK). These phosphorylation events lead to CDC45, treslin and Mdm2-binding protein (MTBP) recruitment (Boos et al. 2013; Kumagai and Dunphy 2017; Heller et al. 2011; Ilves et al. 2010; Jares and Blow 2000). Treslin phosphorylation leads to the recruitment of topoisomerase 2-binding protein 1 (TOPBP1), RecQ-like helicase 4 (RECQL4), GINS complex and Pol ε and the subsequent conversion of pre-RCs into pre-initiation complexes (pre-ICs) (Tanaka et al. 2007; Kumagai et al. 2011; Boos et al. 2011; Muramatsu et al. 2010; Sangrithi et al. 2005). At this point, treslin, MTBP, RECQL4 and TOPBP1 are released and the active replisome is formed thanks to MCM10 loading (Kanke et al. 2012; Watase et al. 2012; Kanemaki and Labib 2006; Gambus et al. 2006). Finally, other proteins such as replication protein A (RPA), proliferating cell nuclear antigen (PCNA) and replication factor C (RFC) are loaded and DNA replication starts, called origin firing (MacNeill 2012) (Fig. 1).

Origins usage and replication timing

Of all the potential replication origins that are loaded with a pre-RC, only a subset will actually be fired. Most of these origins are licensed to work as a backup plan in case replicative stress stalls the replication forks (Ge et al. 2007; Woodward et al. 2006; Santocanale et al. 1999). Moreover, all replication origins do not fire at the same time, but instead they follow a cell-type specific spatio-temporal program, known as the replication timing (RT) program (Dimitrova and Gilbert 1999) (Fig. 2). In mammalian cells, this program is established during G1 phase, ~ 2 h after mitosis in a time window referred to as the time decision point (TDP) (Lu et al. 2010; Li et al. 2001, 2003; Wu et al. 2006; Dimitrova and Gilbert 1999). Interestingly, the establishment of the RT precedes the choice as to which origins are going to be used during S phase; that choice instead occurs later in G1 phase during the origin decision point (ODP) (Dimitrova and Gilbert 1999; Li et al. 2003). The relation between TDP and ODP, and the corresponding mechanism(s) need to be further investigated. RT establishment temporally corresponds to the re-establishment of an organized nuclear architecture after mitosis, with the anchoring of chromosomes to the nuclear periphery (Dimitrova and Gilbert 1999; Li et al. 2001) and the establishment of topologically associated domains (TADs) and the A/B compartments (Dileep et al. 2015). Likewise, RT domains have an extensive overlap with TADs and their being early or late-replicating corresponds to A or B compartments, respectively (Pope et al. 2014; Ryba et al. 2010).

Such a strong correlation between RT and the 3D genome structure led the field to hypothesize that these two processes might be coupled and that one might control the other. This hypothesis has been reinforced by the identification of Rif1 (Rap1-interacting factor 1) as a nuclear structural protein that has an important role in RT regulation (Foti et al. 2016). Conversely, knock-outs or knock-downs of several other nuclear structural proteins, such as cohesin or CTCF, alter chromatin structure but not RT (Oldach and Nieduszynski 2019; Rao et al. 2017; Sima et al. 2019; Nora et al. 2017). Moreover, it has been recently shown that Rif1 haploid cells show alterations in chromatin structure but normal RT, which indicates that although the two processes are coordinated, they can be uncoupled (Gnan et al. 2019).

For years, the field has investigated the possibility that RT could reflect gene transcription. In general, early-replicating regions are enriched with expressed genes, while late-replicating regions are not (Woodfine et al. 2004) (Fig. 2). In addition, during development, regions switching RT from early-to-late (or late-to-early) are associated with genes whose expression is switched off (or on) (Hiratani et al. 2008). However, there are numerous exceptions: for example, regions containing expressed genes can also be replicated late, which challenges a direct link between RT and gene transcription (Rivera-Mulia et al. 2015). In addition, switching off genes at the β-globin locus fails to alter RT when chromatin accessibility is not modified, which also seems to go against this model (Cimbora et al. 2000). Indeed, the change in RT at the β-globin locus is associated with changes in accessibility, which seems to support the idea that RT is associated with chromatin accessibility rather than gene expression (Cimbora et al. 2000). In fact, early-replicating regions are associated with open chromatin states (A compartment), while late-replicating regions are enriched in closed chromatin states (B compartment) (Pope et al. 2014). In a recent article, Dileep and colleagues showed that changes in RT can precede or follow changes in gene transcription or be totally independent from it (Dileep et al. 2019). It is therefore likely that both RT and gene transcription are regulated by some common factors shared between the two processes.

How RT and origin usage are regulated is not fully understood, but they can be explained through a model of differential affinity for limiting factors (Fig. 1). To date, limiting factors have been identified in some organisms and include proteins that are essential for the assembly of the pre-IC, such as CDC45, DBF4/CDC7 (regulatory/catalytic subunit of DDK), RecQL4, Treslin, TOPBP1 and MTBP orthologs (Mantiero et al. 2011; Wu and Nurse 2009; Collart et al. 2013; Wong et al. 2011; Tanaka et al. 2011). What regulates the affinity for these limiting factors to replication origins is still unclear, but probably multiple layers of regulation are in place. A first possibility lays on chromatin looping that clusters together origins being fired and leaving backup origins on the periphery of the loops (Courbet et al. 2008). Along the same line, the order of firing could be regulated through chromatin accessibility. As discussed previously, the early-replicating regions have a more open chromatin state than the late-replicating regions (Pope et al. 2014), which might make the late-replicating regions inaccessible at the beginning of the S phase. Moreover, some proteins globally regulate RT, in a way, controlling the accessibility of the limiting factors. One of these is RIF1, which is enriched at late-replicating regions: RIF1 counters DDK activity thanks to its interaction with PP1 (protein phosphatase 1), dephosphorylating components of the pre-RC and limiting origin firing until late S phase (Cornacchia et al. 2012; Mattarocci et al. 2014; Poh et al. 2014; Hiraga et al. 2014; Sukackaite et al. 2017). In fission yeast, the shelterin complex (also called telosome) is involved in RT regulation of a subgroup of late origins through Rif1. Shelterin can recruit Rif1 on telomeric DNA, as Taz1 does, and also brings late-replicating regions into the proximity of Rif1 (Tazumi et al. 2012; Ogawa et al. 2018; Kanoh and Ishikawa 2001). Moreover, Rap1 and Poz1 (two members of the shelterin complex) depletion can impact RT in an indirect manner. In fact, these mutants exhibit abnormal telomere elongation that delocalizes PP1 ortholog from the late Rif1-dependent and Taz1-independent regions to telomeres (Hasegawa et al. 2019). Fork head 1 and 2 (Fkh1/2) are two transcription factors that have also been reported to regulate RT in yeast. These factors group early origins into clusters to facilitate DDK activity (Knott et al. 2012; Fang et al. 2017) via a direct interaction between Fkh1/2 and Dbf4 (Fang et al. 2017). Similarly, Ctf19 and Swi6 recruit DDK to pericentromeric origins, allowing centromeres to replicate early in budding and fission yeasts (Hayashi et al. 2009; Natsume et al. 2013). In S. cerevisiae, two histone deacetylases, Sir2 and Rpd3, control the RT of origins located within the ribosomal DNA (rDNA) array by tuning their ability to compete with single-copy origins for limiting factors (Yoshida et al. 2014). Work is ongoing to identify additional factors and delineate the underlying mechanisms controlling the origin usage and RT. Such work will help us better understand the complex relationship between DNA replication, gene transcription and chromatin organization.

Transcription-mediated replication stresses and genome instability

As described earlier, replication initiation control is a multi-step process ensuring that the entire genome can be replicated once and only once for each cell division. Gene transcription can interplay with the DNA replication program at all stages, i.e., during the G1 phase for the origin setting (location, firing time etc.), or during the S phase for the origin activation, replication fork progress etc. Here, we describe in detail how gene transcription influences DNA replication that leads to genome instability in normal and pathological conditions, and in the contribution to human diseases.

Transcription–replication collision, R-loop formation and genome instability

Once replication forks have been deployed, their progression can be challenged by numerous factors. One such factor is the presence of active transcription along the genome. Collisions between the replication fork and the transcription machinery can either be co-directional (CD) or head-on (HO) (Fig. 3). The latter can be more dangerous for genome integrity (Hamperl et al. 2017). OK-Seq, which helps identify the direction of replication fork movement, has revealed that origin firing occurs more frequently upstream of the TSSs of active genes, ensuring co-directional replication of the most highly transcribed regions of the genome (Petryk et al. 2016). A wildly localized replication termination at the transcription termination sites (TTSs) of transcribed genes under unperturbed conditions was also revealed. Meanwhile, replication termination could redistribute to gene bodies under replication stress, causing increased gene 3′ end replication in an HO orientation (Chen et al. 2019), which strongly induces transcription–replication conflicts (TRCs).

Fig. 3
figure 3

Transcription–replication conflicts lead to fork stalling and genome instability. Replication and transcription machineries share the same DNA template, which causes replication–transcription conflicts (TRCs). These conflicts can occur in a head-on or co-directional manner. Head-On TRC is generally considered as more deleterious to genome stability, and preferentially occurs around gene transcription termination sites (TTS). The replication forks stall when they encounter RNA Pol II, which favors the transient formation of R-loops. Under normal conditions, harmful R-loop accumulation can be prevented by many factors, such as TOP1, SETX, BRCA1/2, and FANCM. Alternatively, this accumulation can be directly removed by RNase H, XRN2 and certain NER endonucleases like XPG/XPF. If the R-loops and stalled forks persist, the ATR-Chk1 pathway is activated and phosphorylates RPA at the stalled forks. Under topological stress, such as TOP1 depletion, DNA damage is induced, which leads to genome instability (Promonet et al. 2020). R-loops also frequently form at gene transcription start sites (TSS), while they do not seem to induce TRCs and are rather involved in other mechanisms, like transcription regulation

Recently, numerous studies have revealed that TRCs are frequently associated with a specific structure known as R-loops (Fig. 3). R-loops are formed when RNA polymerase progresses along the DNA double strands, with newly transcribed RNA re-annealed to the transiently accessible template strand: a DNA:RNA hybrid forms that displaces the non-template strand (Thomas et al. 1976) mainly in the presence of high GC content sequences (Sanz et al. 2016). Importantly, by analyzing the genome-wide distribution of R-loops by DNA:RNA hybrid immunoprecipitation and next-generation sequencing (DRIP-seq), Cimprich and colleagues revealed that R-loops form preferentially at regions with HO TRC (Hamperl et al. 2017). These data reinforce the idea that the CD bias of the human genome might help to minimize the accumulation of HO collisions and deleterious R-loops.

Cells can also regulate R-loops by opposing their formation. As a matter of fact, R-loops preferentially form in the presence of negative supercoils, such as those formed in concomitance with RNA transcription. To resolve these tensions, cells use topoisomerases that rescue normal DNA tension and reduce the accumulation of R-loops (El Hage et al. 2010; Yang et al. 2014). Recently, P. Pasero, C.L. Chen and colleagues discovered that R-loop formation is enriched at TTSs for a subset of highly expressed genes located at early-replicating regions. Here, a higher level of HO collision is frequently associated with the accumulation of phospho-RPA32 (S33), a hallmark of stalled forks. As a result, at these regions, an increase in DNA double-strand breaks (DSBs) and γ-H2AX, a histone mark around broken replication forks, have been observed in cells with topoisomerase 1 (Top1) depletion (Promonet et al. 2020).

It should be noted that although the presence of R-loops on HO TRC can be deleterious, R-loops can also have important physiological roles in many normal cellular processes, including the regulation of transcription termination, chromosome segregation and rearrangement events (Skourti-Stathaki and Proudfoot 2014; Kabeche et al. 2018; Skourti-Stathaki et al. 2011; Xu et al. 2017a). The R-loop balance in cells is therefore maintained via various strategies to protect genome stability. As mentioned, cells use topoisomerases to reduce topological stress and decrease harmful R-loop accumulation (Fig. 3). Cells also present RNase H 5′–3′ exonucleases that can digest RNA from DNA:RNA hybrids. R-loops can also be prevented or resolved through helicases, such as DHX9 and Aquarius (AQR) (Sollier et al. 2014; Chakraborty and Grosse 2011), senataxin (SETX) (Groh et al. 2017) and PIF1 (Zhou et al. 2014). R-loop formation is also tightly regulated via spliceosome binding to RNA (Li and Manley 2005; Gómez-González et al. 2011; Li et al. 2007; Pefanis et al. 2015), the presence of proteins coating RPA (Aguilera and García-Muse 2012; Nguyen et al. 2018) and the ATR-Chk1 pathway (Matos et al. 2020). Many studies have shown that mutations affecting these factors could induce R-loop-associated human diseases, which we discuss in more detail later.

Proteins of homologous recombination and non-homologous end joining on stalled forks

As obstacles to replication fork progression, R-loops can induce genome instability and thus inevitably activate the DNA damage repair pathway. In particular, stalled forks deriving from TRCs activate Fanconi anemia (FA) DSB pathway—a repair system involved in the resolution of R-loop-mediated replication fork collapse (Schwab et al. 2015; García-Rubio et al. 2015). The disruption of critical FA complex members FANCD2, FANCA and FANCM impairs the restarting of stalled forks, and leads to gene instability and DNA damage from R-loop-mediated replication fork collapse (Schwab et al. 2015; García-Rubio et al. 2015). These effects can be reverted by over-expressing RNase H1, a ribonuclease degrading DNA:RNA hybrid, reinforcing the idea that R-loops are responsible for fork stalling at the HO TRC sites (Schwab et al. 2015). Interestingly, a recent study revealed that SLX4, a tumor suppressor, drives (via its interaction with RTEL1) the recruitment of FANCD2 to RNA polymerase II to prevent endogenous transcription-induced replication stress (Takedachi et al. 2020).

Besides the core FA complex members, other factors involved in homologous recombination (HR) accumulate at DSBs, such as RAD52, RAD51, BRCA1 (also called FANCS), and BRCA2 (also called FANCD1), to regulate genome instability through R-loop resolution. Their recruitment can be reduced by RNase H overexpression at active transcription regions or through specific reporter systems (D’Alessandro et al. 2018; Yasuhara et al. 2018). For example, BRCA1 and BRCA2 prevent the potential harmful effects of R-loops by recruiting helicase SETX to R-loops (Hatchi et al. 2015; Zhang et al. 2017). In particular, BRCA1-dependent recruitment of SETX resolves R-loop structures preferentially at TTSs and suppresses DNA damage. Moreover, SETX depletion impairs RAD51 recruitment and favors 53BP1 accumulation, a key DNA damage response (DDR) factor in non-homologous end joining (NHEJ) (Cohen et al. 2018). These data suggest that DNA:RNA hybrids may favor HR factor accumulation to potentially facilitate the elimination of the hybrids so that HR could occur, likely counteracting NHEJ at DSBs within transcribed genes. Interestingly, a recent study revealed that 53BP1 and BRCA1 counteract each other to control the time-dependent switch of the fork restart pathways: here, 53BP1 promotes the fast and BRCA1 promotes the slow kinetics restart pathways, respectively (Xu et al. 2017b). On the other hand, BRCA2 depletion from cells also increases R-loop accumulation. BRCA2 might prevent R-loop formation by preventing replication fork collapse and recruiting the ssDNA binding protein, Rad51, to DSBs (Schlacher et al. 2011). Moreover, BRCA2 recruits RNA polymerase II-associated factor-1 (PAF1) to promoter-bound Pol II to enhance the pause and decrease of R-loop formation (Shivji et al. 2018).

G1 shortening induces abnormal initiation and genome instability within gene body

G1 phase is an important period for origin setting. Rapidly proliferating mammalian embryonic stem cells (ESCs) exhibit a short G1 phase that is < 2 h due to an unusual cell cycle structure (Savatier et al. 1994). Such a short G1 phase is considered a characteristic of ESCs that might help to inhibit differentiation and preserve their pluripotent state (Li et al. 2012). Several studies have reported that the short G1 phase in ESCs, before differentiating, is related to a unique mechanism of cell cycle regulation. In particular, ESCs express low cyclin D1 levels and no cyclin D2/D3, lack MAPK and pRB control (Jirmanova et al. 2002; Savatier et al. 1996; White et al. 2005), lack pathways of p53-p21 in response to DNA damage (Aladjem et al. 1998) and lack activity of cyclin E-Cdk2 and cyclin A-Cdk2 complexes throughout the cell (Stead et al. 2002; White et al. 2005). These findings highlight that cell proliferation control in ESCs is fundamentally different from that in differentiated somatic cell lineages (Coronado et al. 2013). Ample storage of the factors required for replication and relaxed chromatin structures in ESCs results in many more replication initiation sites in S phase. Despite their short G1 phase, ESCs can effectively tolerate an accumulation of replication stress by extensive fork reversal and replication-coupled repair. This feature allows these cells to preserve genome stability, demonstrating that fast proliferating ESCs do not exhibit mechanisms to delay G2/M and G1/S transitions on incomplete replication (Ahuja et al. 2016).

Conversely, somatic cells have a longer G1 phase, which might help to ensure proper origin licensing to guarantee complete genome duplication. Therefore, G1 shortening in somatic cells, e.g., by overexpressing cyclin E, associated with an altered G1-S transition, may lead to deregulation of replication fork progression and DNA damage (Jones et al. 2013). Cyclin E, a member of the cyclin family, has a critical role in controlling the G1-S transition. It binds CDK2 to form the cyclin E/CDK2 complex, which phosphorylates numerous downstream proteins (such as RB, p27, p21) to regulate multiple cellular processes, thus allowing replication initiation and S phase progression (Siu et al. 2012). Ekholm-Reed and colleagues demonstrated that overexpressing cyclin E can shorten the length of G1 phase from about 10–12 h to as little as 2–4 h (Ekholm-Reed et al. 2004). To deeply discern the detailed mechanisms related to the replication stress induced by cyclin E overexpression, Macheret and Halazonetis mapped DNA replication and transcription genome-wide in cells with abnormal cyclin E activation (Macheret and Halazonetis 2018). By investigating the DNA replication initiation profiles (HU-EdU-seq) from cells overexpressing cyclin E versus cells with normal cyclin E levels, they showed that cyclin E overexpression induces extra origins that are frequently located within intragenic regions (Fig. 4). In addition, analysis of newly synthesized transcript profiles through EU-seq has revealed that these novel origins induced by G1 shortening are often located at the 3′ ends of the gene body, showing lower levels of nascent transcripts in G1 cells due to G1 shortening. Importantly, a specific fork collapse has been observed around these origins that only appears under cyclin E overexpression, while fork collapse has not been observed for the constitutive origins (Fig. 4). Similar results have been obtained by overexpressing MYC. MYC-inducible activation leads to G1-phase shortening and to the firing of intragenic oncogene-induced (Oi) origins. Many of these Oi origins overlap with cyclin E-induced origins (Macheret and Halazonetis 2018). Moreover, overexpression of both genes can induce the firing of a novel set of replication origins within the 3′ gene body of highly transcribed genes that are usually suppressed by transcription during the G1 phase. The precocious entry into S phase, before all genic regions have been transcribed, allows the firing of origins within genes in cells with a short G1 phase (Macheret and Halazonetis 2018). Therefore, DNA replication stress resulted from extra intragenic origin firing caused by premature S phase entry is an important mechanism that leads to genomic instability in human cells.

Fig. 4
figure 4

G1 shortening induces abnormal origin firing within active genes leading to genome instability. In normal cell cycles, the length of G1 is sufficient for transcription to inactivate origins across the entire length of genes (Top panel). When the length of G1 is greatly reduced due to oncogene expression (Bottom panel), there is insufficient time for transcription to inactivate all intragenic origins. This effect allows for the activation of oncogene-induced extra-origins, located within intragenic regions, and leads to chromosome breakage. G1 shortening, e.g., induced by cyclin E or Myc, leads to abnormal replication and genome instability, which might contribute to early cancer development

Interestingly, under replication stress, i.e., under high dose HU treatment, cells can accumulate replication fork stalling and collapse within specific early-replicating regions known as early-replicating fragile sites (ERFSs) (Barlow et al. 2013). These sites are also enriched around replication origins containing long (> 20 bp) Poly(dA:dT) tracts (Tubbs et al. 2018). Whether similar or different mechanisms generate ERFSs is still unknown and thus warrants further investigation.

Transcription-mediated suppression of initiation within large genes lead to CFS instability

Transcription–replication collisions and R-loop formation are not the only ways in which transcription can interfere with DNA replication. Common fragile sites (CFSs) are an example of this. These sites are under-replicated during mild replication stress, for example, in response to aphidicolin, a DNA-polymerase inhibitor that slows the progression of replication forks (Glover et al. 1984). CFSs can be visualized on metaphase spreads as ultrafine bridges between chromatids, gaps or breaks (Chan et al. 2009; Glover et al. 1984) that are hotspots for chromatid exchange (Glover and Stein 1987), chromosome deletions (Bignell et al. 2010; Pichiorri et al. 2008) and amplifications (Hellman et al. 2002; Miller et al. 2006). These regions are preferential sites for chromosome lesions (such as deletion and/or rearrangement) involved in oncogenesis, neurological disorders and viral DNA integration (see Le Tallec et al. 2014; Ozeri-Galai et al. 2014; Sarni and Kerem 2016; Debatisse and Rosselli 2019 for review). The study of CFSs is challenging due to the lack of precise genomic mapping. Traditionally, they have been mapped by conventional cytogenetic screening at a megabase scale. In lymphocytes, the number of CFSs ranges from ~ 20 (with break frequency ≥ 1%) to 230 (including CFSs with lower frequency) (Mrasek et al. 2010). Only a few of them have been mapped on a fine scale (several hundred kb) by molecular cytogenetic analysis combined with fluorescence in situ hybridization (FISH) (Savelyeva and Brueckner 2014), which is very time-consuming. Therefore, most collected data derive from isolated CFSs, which has resulted in some controversial results. In a recent study, CFSs were mapped genome-wide at a high resolution by Repli-Seq technique. The authors compared the RT of cells exposed to a low dose of aphidicolin to the RT of control cells to define the significant delayed regions (SDRs), corresponding to CFSs (Brison et al. 2019). This first genome-wide analysis has shed light on the characteristics and mechanisms responsible for CFS instability, demonstrating that stress-induced delay/under-replication is a hallmark of CFSs (Brison et al. 2019).

CFSs were long believed to be associated with particular sequences, such as stretches of AT-rich sequences that can form a secondary structure that blocks replication fork progression, impedes replication completion and leads to DNA breaks. However, recent studies have shown that CFS instability is cell-type specific, which indicates that it is directed by epigenetic features rather than by specific sequence motifs (Le Tallec et al. 2011). It has indeed been shown that such sequences at FRA3B (a well-studied CFS on chr3) do not overlap with its break boundaries (Durkin et al. 2008). CFSs are mid-late and late-replicating regions, but this is not enough to mark them (Le Beau et al. 1998; Palakodeti et al. 2004; Pelliccia et al. 2008; Hellman et al. 2000; Brison et al. 2019) as there are many more late-replicating regions than CFSs. Interestingly, most fine-mapped CFS cores are replicated in mid-late S phase (instead of late) in non-treated cells, and they become the latest replicating regions only after aphidicolin treatment (Brison et al. 2019). This finding suggests that other mechanisms rather than late-replication per se are responsible for their instability. Remarkably, CFSs are frequently associated with very long expressed genes (> 300 kb) or large transcription domains (sometimes with two or three overlapping genes), although even this is not always the case (Mitsui et al. 2010; Ohta et al. 1996; Rozier et al. 2004; Zhu et al. 2006; Helmrich et al. 2007; Denison et al. 2003; Bednarek et al. 2000; Brison et al. 2019). It has been suggested that CFSs might be caused by R-loop formation resulting from TRC (Helmrich et al. 2011). However, TRC seems unlikely as the delay of replication decreases gradually, in most cases, around both sides of CFS cores in a symmetrical way that is independent of gene orientation but instead reflects the firing time of the flanking origins (Brison et al. 2019). In addition, R-loops and fork stalling positions seem to only accumulate within highly active genes located at early-replicating regions, but not at large late-replicating genes associated with CFSs showing a modest transcription level (Liu and Chen, unpublished results). More importantly, gene transcription–replication encounters are not necessary for CFS expression, as treatments with transcription inhibitors during S phase do not rescue CFS fragility (Brison et al. 2019). Taken together, these results indicate that mechanisms other than transcription–replication encounters are responsible for the strong correlation between large genes and CFSs.

Importantly, on FRA3B (Letessier et al. 2011) and FRA16C (a CFS on chr16) (Ozeri-Galai et al. 2011), there is no (or few) activation of dormant origins to rescue stalled or slowed replication forks. This lack of activation might actually be due to the removal of replication origins by transcription (Gros et al. 2015; Powell et al. 2015). Indeed, the occupancy of components of the pre-RC is low over large genes (> 300 kb) associated with CFSs (Miotto et al. 2016; Sugimoto et al. 2018). The genome-wide analyses of replication origin distribution obtained by OK-Seq or Bubble-Seq along fine-mapped CFSs also support a model by which transcription-dependent suppression of initiation across large genes generates ultra-long (several hundreds of kb) late-replicating origin-poor regions, which delays their replication upon stress (Brison et al. 2019) (Fig. 5a). Moreover, OK-Seq data have further revealed that, in most cases, two major initiation zones flank the large transcribed genes hosting CFSs, located immediately upstream or downstream of the gene, respectively. The unidirectional forks emanating from these initiation zones travel across several hundreds of kb to complete replication of the gene body (Brison et al. 2019) (Fig. 5b). Replication could not be completed when the fork speed was reduced by aphidicolin treatment. The distance separating the initiation zones flanking the genes is therefore a major parameter for CFS setting.

Fig. 5
figure 5

Transcription-dependent suppression of initiation across large genes lead to CFS instability. a Schematic showing how gene transcription shapes the replication landscape responsible for common fragile site (CFS) instability. CFSs are genomic regions that are replicated during mid-late S phase. They are nested within large genes (> 300 Kb) whose transcription leads to the removal of pre-RC complexes from the gene body, leaving it replicated by two long-travelling unidirectional replication forks arising from its flanking regions. Under replication stress, DNA replication might not be completed within these regions. This results in a cruciform structure that must be resolved, otherwise it will lead to the expression of CFSs and genome instability. b The replication fork directionality (RFD) profile detected by Okazaki fragment sequencing (OK-Seq) along FRA16D CFS containing the large gene, WWOX (1.1 Mb). Each point shows the RFD values computed in 1 kb windows. The red and blue points indicate the regions that are predominantly replicated by rightward and leftward replication forks, respectively. The RFD profile agrees with the model shown in (a), with two strong initiation zones (identified as upward transitions on the RFD profile, indicated by the blue box) located at both extremities of the WWOX gene, and the gene body is replicated by long-travelling unidirectional replication forks (red and blue arrows, respectively). The under-replicated CFS core overlaps with the termination zone (downward transition on the RFD profile, indicated by a red box) at the gene center. A similar RFD pattern is observed in most CFSs (Brison et al. 2019)

Independently from its molecular causes, at the end of S phase, cells containing under-replicated regions link together the two sister chromatids (Fig. 5a). At this point, the resolution of these structures could be due to a series of endonucleases (Guervilly et al. 2015; Naim et al. 2013; Ying et al. 2013) that could be recruited to disassemble the replication forks (Deng et al. 2019), and can create single and/or double-strand breaks that give the cells their last chance to repair the damage during the early stages of mitosis. Importantly, several recent studies have discovered that an E3 ubiquitin-protein ligase, TRAIP (TRAF interacting protein), makes an important contribution to driving replisome disassembly during mitosis and promoting fork breakage (Sonneville et al. 2019; Wu et al. 2019; Deng et al. 2019). This event might allow factors involved in mitotic DNA synthesis (MiDAS) (Minocherhomji et al. 2015), a form of break-induced replication (BIR), to have access to the under-replicated CFSs (see Ovejero et al. 2020 for a review). The CFS is expressed if the broken DNA is not properly repaired.

Transcription-mediated replication stresses and human diseases

Defects in DNA replication processes can lead to various diseases. In the following sections, we will focus on some of the most common diseases.

Neurological disorders

R-loops can occur from a variety of cellular stresses, and lead to deleterious complications such as transcriptional irregularities, replication defects and genomic instability, relating to numerous pathologic conditions (reviewed in Richard and Manley 2017). Among them, various neurological disorders, have been linked to R-loops and gene-specific repeat expansions (Table 1).

Table 1 Overview of the neurological disorders associated with transcription-mediated replication stress

Trinucleotide repeat expansions within intergenic regions provide additional risk for harmful R-loop formation that disrupts proper transcription and normal gene expression. For example, diseases like Huntingtin (HTT; Huntington’s disease), ataxin 1/2 (ATXN1/ATXN2; spinocerebellar ataxias) and frataxin (FXN; Friedreich ataxia), all contain GC-rich or GAA trinucleotide expansions that form R-loops in vitro and associate with disease (Reddy et al. 2011; Loomis et al. 2014). The mechanism of fragile X syndrome (FXS) is also related to the trinucleotide expansion in the 5′ UTR (Untranslated Transcribed Region) of the FMR1 gene, which leads to DNA methylation-mediated silencing of this locus (Groh et al. 2014; Colak et al. 2014). It favors the transcription-dependent R-loops, which are resistant to degradation and co-localize with repressive H3K9me2 chromatin mark. By performing a nascent nuclear run-on analysis, Groh and colleagues showed that in FXS patient cells, R-loop over-expanded repeats can block RNA polymerase II transcription of the FXN gene. In affected patients, the FMR1 allele with a (CGG)n>200 expansion in the 5′ UTR is completely methylated and transcriptionally silenced (Santoro et al. 2012; Groh et al. 2014). To test the role of such R-loop formation in trinucleotide expansion diseases, FMR1 transcription has been reactivated by using the DNA methylation inhibitor 5-aza-29-deoxycytidine (5-azadC) (Groh et al. 2014). A fourfold increase in R-loops has been observed over the exon 1 region upstream of the expansion in FXS cells, while in control cells, changes are not significant. This specificity of R-loop formation has been confirmed by RNase H treatment. These findings suggest that transcription-dependent R-loops are localized to the expanded (CGG) repeat region to regulate the expression of the FMR1 gene. Meanwhile, increasing R-loop formation leads to transcriptional repression of the FXN gene, suggesting a direct molecular association between R-loop formation and the pathology of Friedreich ataxia (FRDA) (Groh et al. 2014). The formation of R-loops over expanded repeats might, therefore, favor FXN and FMR1 silencing, and might represent a common feature of nucleotide expansion-associated diseases, contributing to the corresponding pathology in vivo (Groh et al. 2014). Interestingly, FXS cells exhibit high levels of chromosome breaks, in particular, under replication stress (Chakraborty et al. 2019). More importantly, the FMRP, the protein product of FMR1, is required for abating R-loop accumulation, thereby preventing chromosome breakage (Chakraborty et al. 2019). These data provide a detailed mechanism on the direct link between R-loop formation, replication stress and genome instability in FXS.

Active pathways that have a role in avoiding transcription–replication collisions and R-loop accumulation could be altered, leading to DNA damage and human diseases including neurological disorders (reviewed in Zeman and Cimprich 2014). For example, dysfunctional TREX1 or RNase H is responsible for Aicardi–Goutières syndrome that is characterized by severe neurological dysfunction and a congenital infection-like phenotype (Lim et al. 2015). Mutations in aprataxin (APTX), a protein present in the same pathway as RNase H, induce the neurological disorder apraxia oculomotor ataxia 1 (AOA1), characterized by cerebellar degeneration (Tumbale et al. 2014). Neurodegenerative disorders have also been associated with the loss of DNA helicase that has a clear role in the replication stress response. Of note, loss of SMARCAL1, which functions at the interface of replication and transcription (Baradaran-Heravi et al. 2012), leads to Schimke immuno-osseous dysplasia (SIOD), a multisystem disorder characterized by notable neurologic manifestations. Another example is the loss of the helicase SETX, which is involved in avoiding the formation of aberrant DNA:RNA hybrids. SETX has been associated with juvenile amyotrophic lateral sclerosis (ALS4) and ataxia–ocular apraxia (Moreira et al. 2004; Lavin et al. 2013). It should be noted that mature neurons are non-cycling cells; therefore, R-loops would either act on neurons in a replication-independent manner, or on neuron precursors link to DNA replication process. The extent by which R-loops contribute to these diseases via a replication-dependent and/or independent mechanism needs to be further investigated.

DSB repair through canonical NHEJ is important for the development of primary neural stem/progenitor cells (NSPCs) (Gao et al. 1998). Previous studies have demonstrated the presence of recurrent endogenous DSBs using genome-wide translocation sequencing (HTGTS) (Chiarle et al. 2011; Frock et al. 2015), which is a sensitive DNA break joining assay using “bait” DNA breaks introduced on different chromosomes to reveal endogenous “prey” DNA breaks. Recurrent DSB clusters (RDCs) have been mapped in NSPCs in response to replication stress induction (Wei et al. 2016, 2018). The NSPC-RDCs are enriched in the gene bodies of large (> 100 kb), late-replicating genes. Considering that these characteristics (i.e., large active genes at late-replicating regions) are often associated with CFSs, and most RDCs only present after aphidicolin treatment to induce a mild replication stress (Wei et al. 2016), a common mechanism (i.e., transcription-dependent suppression of initiation across large genes) might underlie these events.

Other studies have suggested that TRC might also function in RDC formation (reviewed in Bouwman and Crosetto 2018). Importantly, several neurodevelopmental and neuropsychiatric disorders have been linked to NSPC RDC-containing genes and the activity of neural cell adhesion and/or regulation of synapse formation. For example, molecules involved in cell–cell adhesion and neural development and growth—including the cadherin-associated proteins Ctnna2 and Ctnnd2, Cdh13 Cadherin, Cadm2, the membrane proteins Csmd1 and Csmd3, the glycoprotein Lsamp, cell adhesion molecules Mdga2, Ntm, Sdk1, Npas3, members of the neurexin family Nrxn 1/3, and the excitatory neurotransmitter receptor Grik2—are associated with numerous diseases, including attention deficit hyperactivity disorder (ADHD) (Lesch et al. 2008), intellectual disabilities (Belcaro et al. 2015; Motazacker et al. 2007), schizophrenia (Børglum et al. 2014; Donohoe et al. 2013), bipolar disorder (Ferreira et al. 2008; Nurnberger et al. 2014; Noor et al. 2014) and autism spectrum disorder (ASD) (Turner et al. 2015; Hu-Lince et al. 2005; Vaags et al. 2012; Casey et al. 2012). Interestingly, mutations linked to cerebellar ataxia and microcephaly syndrome have been found in the WW domain-containing oxidoreductase (WWOX) gene, within FRAD16, a well-studied CFS (Abdel-Salam et al. 2014; Mallaret et al. 2014). Likewise, the PARKIN (PARK2) gene, located within another CFS locus, FRA6E, is involved (via germline mutation) in Parkinson’s disease pathogenesis (Denison et al. 2003). Thus, the formation of RDCs and the CFS loci are highly associated with the gene fragility that underlies the most frequent neuronal disorders.

Cancer

The conflicts between replication and transcription are related to oncogene-induced replication stress and consequently to genomic instability, which is a hallmark of cancer (Gaillard et al. 2015; Kotsantis et al. 2016; Jones et al. 2013) (Table 2). For example, increased transcriptional activity induced by H-RAS overexpression causes replication stress, which depends on R-loop accumulation (Kotsantis et al. 2016). Using estrogen receptor-positive (ER +) breast cancer cells, Stork and colleagues showed that treating human breast cancer cells with estrogen (E2) promotes E2-activated transcription and an increase in DSBs together with R-loop formation, which colocalize particularly in regions of the genome containing estrogen-activated genes (Stork et al. 2016). In addition, replication stress induced by oncogene activation during tumorigenesis is associated with increased replication initiation within intragenic regions, leading to conflicts between replication, transcription and genomic instability (Jones et al. 2013). As described earlier, cyclin E and its subunit CDK2 form the cyclin E/CDK2 complex, the activity of which can be regulated at multiple levels and seems to be involved in triggering DNA replication initiation and in regulating genes important for proliferation and progression through the S phase (Ekholm-Reed et al. 2004). When deregulated, cyclin E is involved in tumorigenesis, and is overexpressed in many cancer types (Cooley et al. 2010; Fukuse et al. 2000; Niu et al. 2015). Importantly, somatic cells can tolerate the replication stress induced by oncogenes such as cyclin E, for several cell cycles before going through chromosomal breakage (Neelsen et al. 2013) that could constitute an initiating event in cancer. Together with cyclin E, cyclin A2 (encoded by CCNE1 and CCNA2 genes respectively) shows alterations that have been identified in a subgroup of hepatocellular carcinoma (HCC), named CCN-HCC: here, rearrangements of CCNE1 promoter regions and recurrent fusions involving CCNA2 have been identified. CCN-HCC is characterized by the accumulation of hundreds of tandem duplications and templated insertion cycles (Bayard et al. 2018). Under cyclin E overexpression, BIR, which is involved in DSB and damaged replication fork repair, is required for cell cycle progression (Costantino et al. 2014). Because chromosome rearrangements often occur during BIR upon oncogene activation (Smith et al. 2007), the rearrangements found in CCN-HCC together with the enrichment of breakpoints in early-replicated and actively transcribed regions might be associated with BIR mechanisms caused by replication stress.

Table 2 Overview of tumors associated with transcription-mediated replication stress

The contribution of loss of BRCA1 and BRCA2 function on cancer development has been well established, particularly in breast and ovarian cancers. Tandem duplications (~ 10 kilobase length) frequently observed in BRCA1 mutant breast and ovarian cancers generated by a replication restart-bypass mechanism, which is completed by end joining or by microhomology-mediated template switching (Willis et al. 2017). This finding supports that BRCA1 and BRCA2 have an important role in protecting the replication forks (Xu et al. 2017b; Schlacher et al. 2011). When lacking the protective effects that these genes confer against replication fork collapse, cells show an increase in DSBs. These cancer cells lacking BRCA1/2 are therefore more sensitive to PARP (poly ADP ribose polymerase) inhibitors such as olaparib, rucaparib, niraparib or talazoparib, which can block another alternative repair pathway used by cells (reviewed in Ubhi and Brown 2019). PARP inhibitors are now used frequently as a targeted therapy for cancers with defective BRACA1/2 or other critical HR components, such as Rad51. Interestingly, the cancer-associated genotoxic stress that arises from mutations in BRCA1/2 can be partially rescued by overexpressing RNase H1 in cancer cell lines, suggesting that aberrant R-loop formation also contributes to malignancy (Hill et al. 2014; Hatchi et al. 2015; Zhang et al. 2017).

In addition, Ewing’s sarcoma has been linked to damage-induced transcription, an accumulation of R-loops related to transcriptional stress, and subsequent depletion of functional BRCA1, all of which ultimately results in DNA damage (Gorthi et al. 2018). Moreover, R-loops might have a role in the oncogenic c-MYC-Igh translocation commonly seen in Burkitt's lymphoma and multiple myeloma. Here, the Tudor domain-containing protein 3 (TDRD3) forms a complex with TOP3B, is recruited to the c-MYC CpG island promoter to avoid R-loop accumulation and suppresses chromosomal translocations (Küppers and Dalla-Favera 2001; Shou et al. 2000; Yang et al. 2014). Finally, cancer-derived somatic SLX4 mutations and HHS-associated germline RTEL1 mutations, abrogating the SLX4–RTEL1 interaction, affect the recruitment of FANCD2 at RNA Pol II to resolve R-loops from transcription-induced replication stress and contribute to cancer development (Takedachi et al. 2020).

As described previously, large genes expressed in NSPCs are prone to DSBs and translocations. Genes identified within RDCs are also frequently altered in different tumors (Wei et al. 2016). For example, LSAMP is contained in a small region that is frequently deleted and it has been assigned a tumor suppressor role (Kresse et al. 2009). CDH13 cadherin is involved in cell–cell adhesion activity and neural growth, and is deleted in different tumor types (Kawakami et al. 1999; Kadota et al. 2010; Sato et al. 1998). The NRXN3 synaptic cell surface protein is altered in the medulloblastoma. In prostate cancer, CADM2 and CSMD3 are rearranged and DGKB is involved in inter-chromosomal gene fusions (Berger et al. 2011; Maher et al. 2009). Moreover, a recent report found that CSMD3 and CSMD1 are included in a group of genes identified as the most frequently mutated in stomach adenocarcinoma (Wang et al. 2020). NPAS3, which helps to regulate genes that are involved in neurogenesis, is deleted in high-grade astrocytoma and glioblastoma (Moreira et al. 2011). Finally, the cell adhesion molecule BAI3 has been implicated in glioma progression (Kee et al. 2004).

Deletions in CFSs are considered as one of the major common genetic variations observed during tumor development. The first large gene discovered to be spanned by a highly unstable CFS region was fragile histidine triad (FHIT) that is located within FRA3B. FHIT alterations, such as deletions or loss of expression, have been observed in various tumors, including breast and B-cell lymphoma (Pandis et al. 1997; Kameoka et al. 2004). Another example gene spanned by the CFS region is WWOX, which is located within the second most active common fragile site FRA16D (Bednarek et al. 2000; Ludes-Meyers et al. 2003) and is frequently deleted in several tumors (Krummel et al. 2000; Paige et al. 2000). The third most frequent CFS locus is FRA6E, which contains the E3 ubiquitin gene PARK2: here, its inactivation can accelerate cell-cycle progression and induce cyclin D1 accumulation (reviewed in Glover et al. 2017). Like FHIT and WWOX, PARK2 is a tumor suppressor. Deletion of PARK2 has been described in various cancers and causes a loss of its activity (Letessier et al. 2007; Iwakawa et al. 2012; Denison et al. 2003). Loss of PARK2 activity can induce chromosome instability related to tumor formation. This effect might be due to an alteration of several mitosis regulators, such as Plk1, Aurora A/B, Cyclin B1, Cdc20, and UbcH10, which are normally controlled by PARK2. These alterations can lead to mitotic defects, such as prometaphase-like arrest, anaphase and cytokinesis failure. Given that loss of PARK2 induces multiple chromosomal defects, it seems that PARK2 has an important role in maintaining genomic stability (Lee et al. 2015).

Several CFS-associated genes have protective roles by promoting the DDR, which is a critical mechanism to maintain genome stability. Indeed, the inactivation of several tumor suppressors located within CFSs induces DDR de-regulation. In particular, the tumor suppressor FHIT as well as WWOX has a role in the DDR in regulating apoptosis, which is achieved through interactions with the pro-apoptotic p53 family of transcription factors. Thus, loss of function of these tumor suppressors, together with other gene mutations, such as in p53, have an important role in enhancing the uncontrolled proliferation that promotes genome instability (reviewed in Hazan et al. 2016). Many other genes, such as CTNNA1/3, DLG2, DMD, GRID2, IL1RAPL1, LRP1B, NBEA and RORA, which span CFS regions, are well described and linked to different tumor types (reviewed in Gao and Smith 2015). The high number of large genes contained in the CFS regions can be explained by the transcription-mediated suppression of replication initiation within these large genes (Brison et al. 2019) (see previous section for detail), creating large regions without replication initiations, and leading to genome instability under replication stress, as frequently observed in cancer.

Other pathological conditions

Transcription-mediated replication stress is also involved in a number of other pathological conditions, such as immunodeficiencies, infertility, Prader–Willi and facial anomalies syndromes (Table 3). In particular, genome instability induced by the co-transcriptional R-loop formation has been linked to FA, a genetic disease characterized by bone marrow failure and a strong predisposition to cancer. FA occurs following germline mutations that can occur in up to 22 FA genes, including BRCA1/2 (Yamamoto et al. 2005; van Twest et al. 2017; Nepal et al. 2017). Of note, FANCD2, a core FA gene, accumulates at transcribed genes and has a role in resolving R-loop and transcription–replication conflicts by recruiting RNA processing factors (Schwab et al. 2015; García-Rubio et al. 2015). Particularly, mono-ubiquitination of the FANCI–FANCD2 (ID2) heterodimer complex is due to FANCL ubiquitin E3 ligase activity occurring during S phase and under conditions of replication stress (van Twest et al. 2017; Rajendra et al. 2014). Several reports have shown the presence of increased R-loops in FA mutant cells (Schwab et al. 2015; García-Rubio et al. 2015; Liang et al. 2019), demonstrating that FANCD2 mono-ubiquitination is required to prevent their accumulation and colocalization with R-loops in an actively transcribed genomic region. Although BRCA1 and BRCA2 also belong to the FA gene family, surprisingly, breast or ovarian cancer rarely, if ever, develop in FA patients. It should be noted that FA is primarily an autosomal recessive genetic disorder, in which two mutated alleles are required to cause the disease, while BRCA1/2 defects linked to breast or ovarian cancer are mostly found in heterozygote carriers. Patients with homozygous BRCA2 depletion (BRCA2−/−) generally die from complications of aplastic anemia well before the age of developing breast or ovarian cancer. In addition, FA patients carrying BRCA1 biallelic mutations have not been identified, suggesting biallelic loss of BRCA1 might be lethal to the embryo (reviewed in D’Andrea 2010). It is not completely clear how the loss of a single DNA-repair pathway can induce bone marrow failure, developmental abnormalities and a predisposition to cancer in FA patients; we anticipate that this point will continue to be a hot topic in the field.

Table 3 Other pathological conditions associated with transcription-mediated replication stress

Prader–Willi syndrome (PWS) is a genetic disorder that is caused by the loss of paternal gene expression in the 15q11-q13 chromosomal region, due to small deletions of the SNORD116 locus. Interestingly, R-loops form within the G-rich repeats of the SNORD116 locus, inducing nucleosome displacement in a transcription-dependent manner and chromatin decondensation of the paternal allele (Powell et al. 2013). The SNORD116 locus mediates the effects of topotecan, which induces an increase in R-loops and stalling of transcriptional progression. Among the genetic syndromes characterized by immunodeficiency related to R-loop formation, centromeric region instability and facial anomalies syndrome (ICF) have been described. This syndrome is caused by mutations in the DNA methyltransferase 3B (DNMT3B) and sub-telomeric hypomethylation associated with atypically short telomere length. Transcription of telomeric repeat-containing RNA (TERRA) has an important role in regulating telomere length and its replication. Mature TERRA RNA forms DNA:RNA hybrids with the C-rich DNA template: these telomeric hybrids are present in telomerase-positive cancers (Arora et al. 2014). Moreover, in ICF cells, telomere shortening or loss, increases TERRA transcription levels, indicating that telomere hybrids are involved in promoting instability at the telomeric ICF regions. Indeed, Sagie and colleagues demonstrated that telomere hybrids enhance telomere shortening together with other unknown factors that regulate the length of telomeres, suggesting the contribution of epigenetic modifications (e.g., compromised methylation by DNMT3B) in telomere-specific length regulation (Sagie et al. 2017). Understanding the relationship between DNA:RNA hybrids, replication stress and genome instability in these disorders, and how to use such relationships to find additional targeted therapies, need to be further investigated in future studies.

Conclusion and perspectives

In conclusion, studies over the past few years have provided new and important insights into replication stress and genome instability. Increasing evidence supports that gene transcription has an essential role in shaping the landscape of human genome replication, while it is also a major source of endogenous replication stress inducing genome instability and leading to human diseases. Transcription-mediated replication stresses present at both early and late-replicating regions via two major mechanisms: head-on transcription replication conflicts frequently occur at the transcription termination sites of highly expressed genes in the early-replicating regions, while transcription-dependent suppression of initiation across large genes creating large origin-poor regions is responsible for CFS instability in the late-replicating regions. Due to technical limitations, most studies have only used cell lines as their model system. Ongoing development on high-throughput single-molecule (Müller et al. 2019; Klein et al. 2017) and single-cell (Dileep and Gilbert 2018; Takahashi et al. 2019) approaches to study the DNA replication program will provide novel tools to directly address these questions using patient samples. We expect that this advancement will bring new insights into the detailed mechanisms by which transcription-mediated replication stress impacts on genome instability and human diseases. These will help to better select the patients who will likely respond to a given targeted therapy (such as PARP, ATR or TOP1 inhibitors) targeting factors involving in the corresponding processes, and further develop new targeted therapies to better fight against cancers and other human diseases.