1 Introduction

Forty-five years ago, François Jacob wrote “A bacterium, an amoeba… what destiny can they dream of other than forming two bacteria, two amoeba?” (Jacob 1970). DNA replication is at the heart of cell division. In each cell, this process starts from specific sites along the genome called DNA replication origins. According to the replicon model (Jacob et al. 1963), a DNA replication origin is a genetically defined sequence with which specific proteins interact. A single origin, with a specific sequence , is sufficient to replicate the small bacterial genome. This model has been fully validated in prokaryotes as well as bacteriophages and plasmids. It has been extended to eukaryotic DNA viruses and also Saccharomyces cerevisiae, where a 12–17 bp-specific consensus sequence is common to replication origins. In metazoans, 30,000 to 50,000 origins are used during each cell cycle in order to replicate all the chromosomes before mitosis. However, the nature of these DNA replication origins remains poorly defined. Despite recent important progress in this field of research, what determines a metazoan DNA replication origin and how they are assembled on chromatin remain obscure in multicellular eukaryotes.

This lack of knowledge has hampered the elucidation of the multiple components involved in origin recognition and the structures they form on chromosomes. These structures should be tightly controlled, as each one of the DNA replication origins must be activated once and only once during the cell cycle to avoid gene amplification and genome instability . Defects in the initiation of replication, as with over-replication, may promote genomic instability leading to chromosomal rearrangements and tumor development or progression.

In addition, cross talk between proliferation and differentiation regulates embryonic development and adult tissue renewal. DNA replication must be tightly controlled in order to be coordinated with transcription programs that are engaged during differentiation and to keep the memory of specific chromatin features during DNA replication.

In this review, we address the genetic and epigenetic signatures of replication origins and their organization along chromosomes and within the nucleus and discuss how the replication initiation complex is formed and how replication origins are regulated during embryonic development .

2 Genetic and Epigenetic Signatures

DNA replication origins are assembled and activated in a two-step process that takes place during the G1- and S-phases of the cell cycle , respectively. Therefore, specific signatures may be associated with both steps or with each individual step.

The concept of a replication origin as a genetically distinct sequence with which specific proteins interact was first proposed for bacteria (Jacob et al. 1963) and was fully validated in prokaryotes, bacteriophages, plasmids, and, later on, also in eukaryotic DNA viruses and S. cerevisiae, where DNA replication origins share a 12–17 bp AT-rich specific consensus sequence (see Méchali 2010 for a review). In multicellular eukaryotes, the initial characterization of only a limited number of origins did not permit the identification of such a consensus element. In recent years, genome-wide analyses have allowed the identification of replication origins on a larger scale, in the mouse (Sequeira-Mendes et al. 2009; Cayrou et al. 2011, 2012), Drosophila melanogaster (Cayrou et al. 2011), human (Cadoret et al. 2008; Karnani et al. 2010; Mesner et al. 2011; Martin et al. 2011), and Arabidopsis thaliana genomes (Costas et al. 2011). In all these species, replication origins are located at precise sites along the genome. A relationship between CpG islands found at promoters of house-keeping genes and replication origins was identified in human cells (Delgado et al. 1998; Cadoret et al. 2008). Eukaryotic genomes, as with prokaryotes, display asymmetries in the prevalence of G/C over T/A bases in the leading and lagging replication strands. The profile of the calculated base “skew” along the chromosomes reveals characteristic U- and N-shaped patterns whose boundaries are associated with the presence of replication origins (Touchon et al. 2005). A common, unexpectedly G-rich, repeated consensus element named OGRE (for origin G-rich repeated element) was found in mouse and human origins (Cayrou et al. 2011, 2012), a result in contrast with the AT-richness of bacterial and yeast origins. This element that constitutes OGREs includes repeated stretches of guanosine bases, and it was observed that this pattern matches well with the requirements for formation of the alternative nucleic acid structures known as G-quadruplexes (Cayrou et al. 2011, 2012), initiation of DNA synthesis starting at a short, precise distance downstream of these elements (Cayrou et al. 2012). Potential G4s were further detected at origins in human (Besnard et al. 2012) and chicken (Valton et al. 2014), although neither of these two studies showed the corresponding consensus element.

G-quadruplexes are structures formed between four G-rich single-stranded stretches of nucleic acid (Lipps and Rhodes 2009). When four guanosine bases are placed in horizontal juxtaposition (typically derived from a single DNA strand looping over several times), they can associate with non-Watson-Crick hydrogen bonding to form a planar structure known as a G-tetrad. Where three or more G-tetrads stack on top of each other, stabilized by metal cations (in a cellular context, Na+ or K+), this energetically favorable structure is known as a G-quadruplex (G4 ). In vitro biophysical and structural studies have shown that G4s can form with a variety of G-rich nucleic acids under physiological conditions and may exhibit great structural diversity and flexibility in terms of strand topology (Burge et al. 2006), the size and composition of interstrand loops (Guédin et al. 2010), and inter-tetrad bulges (Mukundan and Phan 2013). Bioinformatic analyses revealed that G4 motifs are present with multiple occurrences within the genomes of all organisms tested. In the human genome, G4 motifs are enriched at telomeres, which contain multiple repeats of the sequence (TTAGGG)n, in ribosomal DNA, at immunoglobulin switch regions, and variable number tandem repeats (Davis and Maizels 2011). In the context of genes, motifs peak in promoter regions (Huppert and Balasubramanian 2007), in 5′-untranslated regions (UTRs), and at the 5′ ends of introns, the frequency dropping with each successive intron (Maizels and Gray 2013).

Evidence that G4s can regulate replication origins was described using a recombination-based approach in chicken DT40 cells, by introducing specific deletions and mutations in the G4 motif associated with two model origins (Valton et al. 2014). How could the presence of G4s at particular genomic locations contribute to the creation of active origins at proximal loci? Forming an active origin requires the binding of the pre-replication complex (pre-RC ), of which the hexameric origin recognition complex (ORC) is an essential part. Could G4s provide locations on the genome for the binding and loading of ORC hexamers which eventually find their way to the “correct” origin sites for pre-RC formation? Some evidence in support of this notion comes from biochemical studies from Hoshina et al. that showed binding can occur between reconstituted recombinant ORC and DNA oligonucleotides containing an “artificial” G-rich repeat such as (GGGTT)n (Hoshina et al. 2013); however, an interaction between a genome-derived G-rich sequence and endogenous ORC has yet to be reported. The binding of ORC to G4s is in agreement with the position of the OGRE /G4 motif upstream of the initiation site, where the pre-RC complex is likely to form. Recently, analyses of replication origins in different yeast strains interestingly showed that the GC-richness in these sites is not a feature restricted to metazoans: this property is also present in Schizosaccharomyces japonicus (Xu et al. 2012) and the budding yeast Pichia pastoris (Liachko et al. 2014).

Nevertheless, consensus signatures are not sufficient to explain the localization of replication origins in the genome. If most origins contain the OGRE /G4 element, these elements are also at other places on the genome. To explain this result, one could consider that there are many more origins than those detected by the present state of the art. Another explanation is that combinations of several genetic and epigenetic signatures characterize replication origins.

Several epigenetic marks have been described at replication origins in recent years. A major feature is the association of open chromatin with replication origins. In S. cerevisiae, despite their sequence specificity, forcing the positioning of a nucleosome at a DNA replication origin inhibits initiation of replication (Simpson 1990). Chromatin marks favoring open chromatin at replication origins include histone acetylation (Aggarwal and Calvi 2004; Iizuka et al. 2006; Miotto and Struhl 2010; Costas et al. 2011; Liu et al. 2012; Chen et al. 2013), as well as histone methylation. H4K20 methylation is a prominent feature that has been implicated in replication origin activity (Tardat et al. 2010; Beck et al. 2012; Valton et al. 2014).

The H4K20me1 mark appears to be more related to the formation of the pre-RC in G1-phase (Rice et al. 2002). It could play a role in recruiting ORC through the binding properties of Orc1 and LRWD1, whereas H4K20me3 may play a role in replication origin selection during S-phase (Beck et al. 2012). Two other methylation marks associated with origins are H3K79me2, which may be involved in preventing rereplication at some origins (Fu et al. 2013), and H3K56me1 (Yu et al. 2012).

3 Flexibility in Replication Origin Firing

A minority of potential replication origins is activated in a given cell during each cell cycle . Although already predicted in pioneering studies (Taylor 1977), showing that replication stress can activate new replication origins, then confirmed in yeast models (Friedman et al. 1997), this finding regained interest only in very recent years. Genome-wide analyses performed in Schizosaccharomyces pombe (Heichinger et al. 2006) and in mouse cells (Cayrou et al. 2011) showed that no more than one third of replication origins were used during each cell cycle in each individual cell. However, the subset of origins used is not the same in each cell. The variation in the choice of origins to be activated in each cell is difficult to determine, because it is currently technically impossible to perform single-cell genome-wide mapping of replication origins. If replication origin activation were mainly stochastic, the choice of the sets of origins to be activated would be based only on the growth conditions and chromosome organization, which also varies from cell to cell (Nagano et al. 2013). In this case, the main requirement would be to have enough replication origins to deal with the cell cycle, without preferential choice. The unused replication origins would serve as spare or dormant origins, to be activated only in the case of fork progression problems (often called replicative stress) due to DNA damage or poor growth conditions (Blow et al. 2011; McIntosh and Blow 2012). Therefore, this excess of replication origins would represent an important genome safeguard mechanism to ensure that the entire genome is duplicated during each cell cycle.

Alternatively, replication origins might be in excess to allow some flexibility to the cell for origin usage, according to its gene expression profile. In this model, the organization of DNA replication origins could be associated with the organization of chromosomal domains for cell fate and cell identity, a process linked to development . DNA replication origins are indeed developmentally regulated in frogs (Callan 1974; Hyrien et al. 1995), as well as in Drosophila cells (Blumenthal et al. 1974; Sasaki et al. 1999). Moreover, the pattern of DNA replication origins can be entirely reprogrammed when differentiated nuclei are exposed to an early embryonic cell context (Lemaitre et al. 2005). This relationship between DNA replication and gene expression programs is in line with recent findings pointing to correlations between DNA replication timing and chromosome structure, organization and position in the nucleus, and also chromatin marks. This relationship might also explain how the memory of ongoing differentiation programs is maintained during successive cell divisions.

4 Spatial Organization of Replication Origins

4.1 Replication Foci

An important aspect by which replication origins are organized is their spatial arrangement in the nucleus. Early observations that replicating DNA is localized to discrete sites within nuclei, using 3H-thymidine labeling followed by electron microscopy and autoradiography, were made over 50 years ago (Revel and Hay 1961). Subsequent studies at higher spatial resolution, using labeling based on the nucleotide analogues bromodeoxyuridine (Nakamura et al. 1986) or biotin-dUTP (Nakayasu and Berezney 1989), combined with fluorescence microscopy, revealed that replication takes place at hundreds of subnuclear sites, which became known as “replication foci, ” indicating that this process is subject to tightly controlled compartmentalization. These foci can also form de novo on purified DNA assembled in pseudonuclei in Xenopus laevis egg extracts (Cox and Laskey 1991).

Studies in which a pulse of label was introduced into synchronized cells at different times during replication revealed changes in the number, size, and morphology of replication foci during S-phase (Yanishevsky and Prescott 1978; Nakamura et al. 1986; Nakayasu and Berezney 1989; Schermelleh et al. 2007). These replicating nuclei display three sequential types of pattern (Nakayasu and Berezney 1989): Type I (early S-phase) consists of a few hundred small dots roughly evenly distributed. Type II foci (mid S-phase) are slightly larger and fewer in number, showing accumulation on the inner nuclear envelope and around nucleoli. Type III foci (late S-phase) take the form of sparser and larger “patches,” some of which adopt ring- or horseshoe-like shapes.

So what are these replication foci , and what is their relation to origins? The nature and composition of foci remains poorly characterized, but indication of the identities of some components has come from the colocalization of various factors required for DNA replication, including proliferating cell nuclear antigen (PCNA; Celis and Celis 1985), replication protein A (RPA; Adachi and Laemmli 1992), DNA methyltransferase 1 (Dnmt1) (Leonhardt et al. 1992), DNA polymerase α (Hozak et al. 1993), Cyclin A and Cdk2 (Cardoso et al. 1993; Sobczak-Thepot et al. 1993), and DNA ligase (Montecucco et al. 1995). In contrast, known origin-binding proteins such as the pre-RC components Cdt1 , Cdc6, and ORC subunits occasionally show a punctate pattern, but one that does not correspond well to foci of newly synthesized DNA. For Cdc6, MCM7, Orc1, and Orc2, more distinct localization to non-chromatin structures was observed when the nucleus was treated with cross-linking agents before extraction (Fujita et al. 2002), but the colocalization with known replication focus component PCNA is only partial. Together, this points to the foci observed being sites of active replication rather than replication origins themselves.

One of the current working models for the functional organization of replication units is the “flexible replicon model,” which is consistent with data from origin mapping studies and their spacing from DNA combing analyses (Cayrou et al. 2011) and observations of subnuclear foci . Here, the word “replicon,” originally applied to bacteria, has been adapted to fit the context of metazoan replication origins, their interaction, and regulation. A flexible replication unit appears as a series of adjacent potential origins of replication (3–4 as a mean), in proximity along the genome, with only one activated in each cell in a given cell cycle , resulting in a mean inter-origin spacing of 100–120 kb. In addition, the activated origin inside this replication unit may vary from cell to cell, even in the same cell population. This flexibility in origin usage within each replication unit is an important characteristic of eukaryotic cells. The basis of origin choice within a given replicon may occur by a stochastic process or may be determined by cell identity or developmental stage (Cayrou et al. 2011).

A second level of organization is the formation of a group of replication units, termed a “replicon cluster,” whose active origins are brought together in space to form a replication focus (or functionally, a “replication factory”). The intervening DNA in each replicon loops out, forming a shape resembling a rosette (Vogelstein et al. 1980). The firing of replicons within a cluster occurs coordinately – a phenomenon that may be due to their induced proximity.

What could hold active replication centers together to make factories? As foci remain poorly characterized, this question remains open, and there are several non-mutually exclusive possibilities that may be relevant. One is that replication forks contain proteins that dimerize, promoting self-assembly between adjacent replicons . A second idea is that replication forks are embedded in an as-yet-uncharacterized substance that holds them together. The composition of foci is not known in detail, as it has not yet been possible to physically isolate them from chromatin and other insoluble nuclear structures for analysis by, for example, proteomics. A third idea is that replicons are held together by entrapping the looping DNA of adjacent replicons. Support for this third idea came from a recent study indicating that the cohesin complex may play a key role (Guillou et al. 2010). Cohesin was found to physically interact with the MCM2-7 complex and to show enrichment at origin sites. Depletion of cohesin subunit Rad21 slowed S-phase independently of a checkpoint response and led to a reduction in the intensity but not the number of DNA replication foci and a lengthening of DNA loops. The cohesin complex forms a tripartite ring capable of topologically entrapping two DNA strands, the best-known role for which is holding together duplicated sister chromatids from S-phase until anaphase (Peters et al. 2008; Losada 2014). The use of a similar topological entrapment to hold together DNA strands from neighboring replicon loops is an entirely plausible mechanism by which cohesin could mediate replication focus integrity.

The advent of super-resolution fluorescence microscopy has opened new windows onto the physical characteristics of subnuclear bodies (Schermelleh et al. 2008; Cseresnyes et al. 2009). One study that used this approach followed by 3D image analysis to discern and quantify BrdU-labeled bodies in S-phase in mouse myoblast cells identified ~4000 spherical S-phase nuclear bodies at any one time; this was estimated to imply that ~40,000 foci form in total during S-phase (Baddeley et al. 2010). However, whether these structures correspond to the same replication factories as those studied over the past two decades or represent newly discovered subnuclear structures awaits further investigation. It is likely that as resolving power and imaging technology continue to improve, yet further, smaller subnuclear bodies may be visualized and counted – it remains to be seen whether and how this will help us to understand the underlying principles behind the nuclear organization of replication origins.

Another recently identified regulator of replication loops and foci is the protein Rap1-interacting-factor-1 (Rif1). In human cells, Rif1 colocalizes with replication foci in mid S-phase (but not early or late S-phase). Depletion of this protein led to major changes to the patterns of subnuclear structures, including the loss of mid S-phase foci , and resulted in increases in the sizes of DNA loops (Yamazaki et al. 2012). Importantly, Rif1 depletion also advanced the pattern of timing of origin activation : the replication of origins that usually takes place at in mid/late S-phase was advanced to early S-phase. Given the conservation of Rif1 and its functions from yeast to human cells, this protein has been proposed as a global regulator of replication timing through regulating higher-order chromatin architecture (Yamazaki et al. 2013).

4.2 The Nuclear Matrix

A further level of organization of replication origins is its localization to the subnuclear matrix . Although still poorly defined in terms of composition, numerous observations in live and fixed cells point toward the existence inside the nucleus of a dynamic meshwork of fibers associated with and emanating from the nuclear lamina (Nickerson 2001; Tsutsui et al. 2005; Wilson and Coverley 2013; Razin et al. 2014). The nuclear matrix is usually defined technically, as the structure that remains when nuclei have been processed by permeabilization, digestion of unattached DNA with a nuclease, and removal of proteins (mainly histones) and other factors using a wash solution typically of high ionic strength.

Early studies made key observations that showed the relevance of the nuclear matrix to DNA replication. Firstly, pulse labeling of cells followed by matrix extraction, electron microscopy, and autoradiography showed that the majority of newly replicated DNA was associated with matrix structures (Pardoll et al. 1980). Secondly, permeabilizing nuclei and then treating them with increasing salt concentrations and ethidium bromide leads to the extrusion of DNA in loops of a fixed size range, a structure known as a “halo.” Pulse-chase labeling showed that DNA replication occurs at discrete sites at the base of these loops, with newly replicated DNA traveling outward toward the periphery (Vogelstein et al. 1980).

4.3 SARs , MARs , LADs , and TADs

The identification that some sections of genomic DNA remain attached to the nuclear matrix after DNase digestion and extraction led to efforts to characterize these, which are defined by the methods by which they are isolated: scaffold-associated regions (SARs ) are resistant to extraction with lithium 3,5-diiodosalicylate (Mirkovitch et al. 1984), whereas matrix attachment regions (MARs ) are resistant to 2 M NaCl (Cockerill and Garrard 1986). The overarching term S/MARs is often used to cover both types of DNA segment.

Early sequencing studies of S/MARs identified some characteristics common to these segments, including AT-richness, the presence of curved or kinked DNA, and DNase I hypersensitivity sites (Boulikas 1993). Linnemann et al. used both extraction methods combined with microarray analysis to identify sequences corresponding to SARs and MARs on human chromosomes 14–18 and their intervening loop regions (Linnemann et al. 2009). This revealed that half of SARs and MARs are in common, and their distribution peaks about 500 bp from neighboring genes. A recent study that used next-generation sequencing to characterize MARs in Drosophila embryos identified a series of simple sequence repeats associated with these segments (Pathak et al. 2014).

In an alternative approach, Guelen et al. used the DamID technique and microarray analysis to perform human genome-wide mapping of lamina-associated domains (LADs ) (Guelen et al. 2008). These regions range from 1 to 10 Mb in size and are associated with low gene expression , and their borders are demarcated by the insulator protein CTCF by promoters oriented away from LADs or by CpG islands.

Clearly what would greatly advance the field would be a high-resolution study that maps S/MARs and replication origins in one well-characterized cell type, to identify their sequence features and juxtapositions with each other and with coding and noncoding genes. Careful correlation could also be made with the epigenetic marks, such as those mapped by the ENCODE project (ENCODE Consortium 2012).

A link between the organization of chromosomes in distinct large chromatin domains and replication timing domains (Ryba et al. 2010), recently further documented as topologically associating domains (TADs, Pope et al. 2014), emphasizes the importance of the structural organization of replication units in the nucleus, possibly linking these structures with the observation of replication foci .

5 Molecular Players and Regulation of Replication Licensing

5.1 Players of the Replication Initiation Complex

Replication origins are established and activated in two distinct steps. During G1-phase of the cell cycle , origin sequences become loaded with the replicative helicase, the hexameric MCM2-7 complex , resulting in the formation of the pre-RC (Remus and Diffley 2009). This process, also known as origin licensing , results in origin loading of the MCM2-7 double hexamer in an inactive state. In the second step, replicative helicases become activated resulting in the origin firing. This step, known as pre-initiation complex (pre-IC ) formation, occurs in S-phase and involves a phosphorylation-dependent association of the helicase with additional subunits. The licensing and the activation reactions being separated in the cell cycle, this strongly represses reactivation of origins during S-phase, which would cause genomic amplification and instability (Vaziri et al. 2003; Neelsen et al. 2013). Cells have evolved several overlapping mechanisms to ensure that rereplication processes do not take place (see next section).

Pre-RC formation culminates with the chromatin loading of two copies of the MCM2-7 complex , in a head-to-head configuration (also referred to as a “double hexamer”) (Chong et al. 2000; Evrin et al. 2009). This reaction, which occurs from late mitosis to the end of G1-phase, requires the activity of several essential and conserved proteins. First, origins are recognized by the six-subunit origin recognition complex (Orc1-6) (Fig. 2.1) (Bell and Stillman 1992; Diffley and Cocker 1992). The Orc1-5 subunits possess an AAA+ ATPase domain (Speck et al. 2005); however, Orc1, Orc4, and Orc5 bind ATP (the other subunits have inactivating mutations), and only Orc1 and Orc4 display an essential ATPase activity (Klemm et al. 1997; Bowers et al. 2004). The last ORC subunit Orc6 was shown to be important for pre-RC maintenance in vivo (Semple et al. 2006). To bind DNA, the ORC complex needs to be ATP bound (Klemm and Bell 2001). Cdc6, another essential AAA+ ATPase (Zhou et al. 1989; Cocker et al. 1996), is then recruited to the origin through interaction with the ORC complex (Liang et al. 1995), mainly through Orc1 (Zhou et al. 1989). The loading of the MCM2-7 complex onto chromatin, and thus licensing , requires the essential protein Cdt1 (Maiorano et al. 2000; Nishitani et al. 2000). Cdt1 does not possess enzymatic activity; however, it provides a link between the ORC and MCM2-7 complexes, as it interacts with both (Ferenbach et al. 2005; Chen and Bell 2011). ATP hydrolysis catalyzed by ORC and Cdc6 induces the release of Cdt1 and the formation of the ORC/Cdc6/MCM2-7 complex (OCM). The OCM is then converted to a double hexamer through the action of Cdc6’s ATPase activity (Randell et al. 2006; Remus et al. 2009; Evrin et al. 2009, 2013; Coster et al. 2014; Sun et al. 2014). Multiple rounds of MCM2-7 loading are executed by the ORC complex , which requires the ATPase activity of the Orc1 and Orc4 subunits (Bowers et al. 2004).

Fig. 2.1
figure 1

Establishment of replication origins. Eukaryotic genomes contain multiple specific loci where DNA replication is initiated, called replication origins. In G1-phase of the cell cycle, origins are recognized by ORC, in an ATP-dependent manner. Cdc6, Cdt1, and the replicative helicase MCM2–7 are then recruited onto chromatin. This activates the ATPase activity of ORC and Cdc6, inducing the release of Cdt1 and yielding the OCM complex (ORC/Cdc6/MCM2–7). ATP hydrolysis is also important for a second MCM2–7 complex to be recruited onto DNA, with the help of Cdt1, thus generating the MCM2–7 double hexamer and forming the pre-RC. Additional rounds of ATP-dependent MCM2–7 loading take place, forming an extended pre-RC. As cells exit G1- and enter S-phase, CDKs and DDKs become activated and phosphorylate several key factors, including MCM2 and MCM4, Treslin, and RecQ4, leading to the their recruitment to pre-RCs, along with MCM10, Cdc45, TopBP1, and the GINS complex, thus forming the pre-IC

As cells enter S-phase, the cell cycle -regulated kinases DDK (Dbf4-dependent kinase) and CDKs (cyclin-dependent kinases) become activated and phosphorylate the MCM2-7 complex . Notably Cdc7, the catalytic subunit of DDK, phosphorylates MCM2 and MCM4 (Jares and Blow 2000; Tanaka et al. 2007; Heller et al. 2011), which favors association of the MCM2-7 complex with Cdc45 and the GINS complex, forming the CMG (Cdc45-MCM2/7-GINS) complex (Ilves et al. 2010). Also, CDK-phosphorylated Treslin and RecQ4, in addition to TopBP1 and MCM10, play essential roles in CMG helicase activation (Im et al. 2009; Kumagai et al. 2010, 2011; Boos et al. 2011; Thu and Bielinsky 2013). Pre-IC formation is a limiting step in origin activation and is thus very much involved in the timely activation of origins during the length of S-phase (Tanaka et al. 2011).

5.2 Regulation of Origin Licensing

Pre-RC formation must be restricted to a period prior to the initiation of DNA synthesis. Indeed, relicensing once S-phase begins would lead to origin reactivation and subsequent genomic amplification, a situation known as rereplication (Fig. 2.2). Cells that engage in rereplication exhibit DNA damage and genomic instability (Vaziri et al. 2003; Neelsen et al. 2013) associated with cell cycle arrest, apoptosis, or senescence (Vaziri et al. 2003; Melixetian et al. 2004; Zhu et al. 2004).

Fig. 2.2
figure 2

Rereplication and its suppression. During G1-phase, origin licensing is active and pre-RCs are formed at origins. After S-phase is initiated, if the licensing reaction were kept active, origins would be reactivated, leading to the formation of a nested replication bubble. This process, known as rereplication, can be a source of genome instability. Cells have evolved several overlapping mechanisms to ensure that origin licensing and DNA synthesis are temporally separated. First, the APC/C target Geminin binds and inhibits the licensing activity of Cdt1. Second, protein phosphorylation is also important for restraining licensing. Indeed, phosphorylation, mainly mediated by CDK of MCM2–7, Cdt1, Orc1, and Cdc6, leads to their functional inactivation. Finally, Cdt1, Orc1, and Cdc6 are subject to ubiquitin-dependent proteolysis by the proteasome

Several partly overlapping mechanisms exist that repress licensing (Fig. 2.2; see Truong and Wu 2011 for a review). During G1-phase, the ubiquitin ligase APC/CCdh1 is active and induces the degradation of Geminin (McGarry and Kirschner 1998), an inhibitory protein of Cdt1 (Wohlschlegel et al. 2000; Tada et al. 2001), thus allowing licensing to take place. Once S-phase is initiated, APC/CCdh1 is inactivated, permitting the expression of Geminin, resulting in the inhibition of Cdt1. The ubiquitin-proteasome pathway has also a more direct role in regulating licensing . Indeed, the ubiquitin ligases CRL4Cdt2 targeting Cdt1 (Higa et al. 2003; Hu et al. 2004; Arias and Walter 2006), SCFSkp2 targeting Cdt1 (Li et al. 2003; Nishitani et al. 2006), and Orc1 (Mendez et al. 2002) and APC/CCdh1 targeting Cdc6 and Cdt1 (Petersen et al. 2000; Sugimoto et al. 2008) restrain licensing by targeting their substrates to the proteasome, depending on the cellular context. CDK-mediated phosphorylation of pre-RC components also plays an important role in inhibiting origin licensing once DNA synthesis is initiated. Protein phosphorylation can affect subcellular localization, as is the case for Orc1 (Saha et al. 2006), Cdc6 (Petersen et al. 1999), and MCM7 (Nguyen et al. 2000), or impairs chromatin association as documented for Cdt1 (Sugimoto et al. 2004; Chandrasekaran et al. 2011; Miotto and Struhl 2011; Coulombe et al. 2013).

5.3 ORC -Associated Proteins and Links with Chromatin

LRWD1 (leucine-rich WD40 domain containing protein 1), also known as ORCA (ORC-associated protein), was identified as a novel ORC-associated protein through proteomic approaches (Bartke et al. 2010; Shen et al. 2010; Vermeulen et al. 2010). LRWD1 binds Orc2 as well as Cdt1 and Geminin (Shen et al. 2012). The role of LRWD1 seems to be to stabilize the ORC complex on chromatin (Shen et al. 2012), thus allowing licensing . LRWD1 binds epigenetic repressive marks (H3K9me3, H3K27me3, and H4K20me3) (Bartke et al. 2010; Vermeulen et al. 2010; Chan and Zhang 2012) in vitro and is highly enriched in pericentric heterochromatin in an H3K9me3-dependent manner (Chan and Zhang 2012). Interestingly, depletion of LRWD1 or Orc2 induces a derepression of the pericentric region, allowing permissive transcription of major satellite DNA (Chan and Zhang 2012).

The histone acetyltransferase (HAT) HBO1 (HAT binding to Orc1) was shown to bind to Orc1 in a two-hybrid screen (Iizuka and Stillman 1999). This HAT can acetylate histones H3 and H4 in vitro (Iizuka and Stillman 1999) and is responsible for H3K14 acetylation in vivo (Kueh et al. 2011). HBO1 was later shown to act as a positive cofactor for Cdt1 , thus stimulating licensing (Miotto and Struhl 2008). HBO1 was also shown to be essential for pre-RC formation and DNA replication in immunodepletion experiments performed in Xenopus egg extracts (Iizuka et al. 2006). Interestingly, HBO1 was shown to acetylate several pre-RC proteins, suggesting a novel mechanism for regulating licensing (Iizuka et al. 2006). Tethering HBO1 to plasmid DNA stimulates episomal replication in vivo (Chen et al. 2013); however, studies of HBO1 knockout mice did not find a role for this HAT in DNA replication (Kueh et al. 2011).

HP1 (heterochromatin protein 1) is an important factor binding the repressive mark H3K9me3 (through its chromodomain) and is involved in constitutive heterochromatin maintenance (reviewed in Canzio et al. 2014). HP1 was shown to physically associate with the Orc1, Orc2, or Orc3 subunits, depending on the experimental system (Pak et al. 1997; Auth et al. 2006; Prasanth et al. 2010). The two factors seem to stabilize one another on DNA and play an important role in maintaining the repressive state of heterochromatin (Pak et al. 1997; Shareef et al. 2001; Prasanth et al. 2010). Paradoxically, in S. pombe, the HP1 homologue (Swi6) involved in the repression of the silent mating type directly interacts with the origin-activating kinase DDK to favor firing of origins early in S-phase (Hayashi et al. 2009).

TRF2, a member of the Shelterin complex , specifically binds telomeres and is important for telomere maintenance (reviewed in Diotti and Loayza 2011; Lewis and Wuttke 2012). TRF2 was shown to associate with Orc1 and be responsible for ORC association with telomeres (Deng et al. 2007). Telomeric regions are transcribed, yielding an RNA molecule called TERRA (Azzalin et al. 2007; Luke et al. 2008). This noncoding RNA associates with the TRF2/ORC complex and helps recruit it to telomeres (Deng et al. 2009). At the telomere, ORC complexes play a role in telomere heterochromatin formation and maintenance (Deng et al. 2007). In addition, the presence of the ORC complex at telomeres favors the formation of pre-RCs and thus replication of these difficult-to-replicate regions (Tatsumi et al. 2008). Consistent with its positive role in licensing , TRF2 can also stimulate the replication of the Epstein-Barr virus DS (dyad symmetry) replication origin by promoting the loading of ORC onto chromatin (Atanasiu et al. 2006).

6 Regulation of Origin Activation During Development

The temporal and spatial patterns of replication origin activation are subjected to dramatic changes during embryonic development in metazoans. In Drosophila and Xenopus, early embryonic divisions rely exclusively on maternal stockpile products in the absence of cell growth and transcription and are characterized by rapid cell cycles with short S-phases and no gap phases. After a fixed number of cleavages, embryos undergo a radical change in which the zygotic genome starts being transcribed (Newport and Kirschner 1982a, b). The period when this transition occurs differs between organisms and is called the mid-blastula transition (MBT ). This takes place after several cell cycles in amphibians and fishes (Newport and Kirschner 1982a; Kane and Kimmel 1993), while mammalian embryos require zygotic transcription already at the two-cell stage (Schultz 2002).

The MBT is characterized by several events contributing to the prolongation of cell cycle duration: S-phases are lengthened, gap phases are incorporated, and a cell cycle checkpoint is activated (Newport and Kirschner 1982a; Newport and Dasso 1989; Clute and Masui 1997; Finkielstein et al. 2001; Iwao et al. 2005). This transition occurs when an increasing nuclear-to-cytoplasmic (N/C) ratio reaches a critical threshold (Newport and Kirschner 1982a; Edgar and Orr-Weaver 2001; Maller et al. 2001). Before the MBT, active replication origins are spaced every 10–15 kb in Xenopus and Drosophila, and S-phase lasts less than 15 min. In contrast, post-MBT divisions are characterized by a restriction in origin usage, which lengthens the replicon size (Blumenthal et al. 1974; Callan 1974; McKnight and Miller 1977; Hyrien et al. 1995).

Two main hypotheses have been proposed in order to explain the cell cycle changes observed during MBT . During early development , several cell divisions occur in the absence of cell growth, bringing the N/C ratio to that of a typical somatic cell at MBT. A titration of factors involved in origin activation may occur during this phase. Modifications of chromatin structure have been also proposed to explain the transcription onset. We will describe some molecular mechanisms that contribute to these developmental changes, leading in some particular cases to severe disorders .

6.1 Firing of Replication Origins During Development

As mentioned previously, licensed origins are activated during S-phase by the CDK and DDK kinases, which coordinate the recruitment of GINS and Cdc45 to chromatin-bound MCM2-7 (forming the replicative helicase, i.e., the CMG complex ) and promote DNA synthesis (Labib 2010; Riera et al. 2014). This reaction requires three essential components for replication initiation: TopBP1/Cut5, Treslin/Ticrr, and RecQ4, the functional vertebrate orthologs of budding yeast Dbp11, Sld3, and Sld2, respectively (Diffley 2010). Sequential steps involving those factors and leading to replication firing in metazoans have been the focus of intense investigations and remain an active field of discovery (see Siddiqui et al. 2013; Tanaka and Araki 2013 for reviews). CDK-dependent phosphorylation of Treslin mediates its interaction with TopBP1, leading to Cdc45 loading onto chromatin (Kumagai et al. 2010, 2011; Boos et al. 2011). In contrast, RecQ4 is recruited onto TopBP1 in a CDK-independent manner. The protein promotes CMG assembly as well as recruitment of DNA polymerase α and RPA (Matsuno et al. 2006; Im et al. 2009).

The main identified target of Cdc7/Dbf4 is the MCM2-7 complex . Phosphorylation of the MCM2, MCM4, and MCM6 subunits switches the replicative helicase to an active state and could also mediate Cdc45 loading (Masai et al. 2000, 2006; Sheu and Stillman 2010).

This pathway of origin activation has recently been implicated in the regulation of Xenopus early development . The levels of Cut5, Treslin, RecQ4, as well as DRF1 (the major regulatory subunit of Cdc7 during Xenopus early development) become rate limiting for replication progressively as the N/C ratio increases (Takahashi and Walter 2005; Collart et al. 2013). Increasing the amount of these proteins using mRNA microinjection in embryos shortens the inter-origin distance and induces additional rapid cell divisions after the MBT .

B55α, a regulatory subunit of protein phosphatase 2A (PP2A), also becomes limiting for replication origin firing under elevated N/C ratio conditions during Xenopus development (Murphy and Michael 2013). Interestingly, high PP2A activity in “post-MBT -like” conditions counteracts the negative regulation of origin activation and thus maintains a high-fired origin rate similar to a pre-MBT replication program. The downstream target(s) of PP2A-dependent firing has not been clearly identified, but ATR would be one good candidate. Taken together, these results suggest that PP2A-B55α activity is critical for the regulation of origin activation during embryonic development.

Therefore, the titration of an excess of factors essential for DNA replication may be an essential regulatory mechanism, explaining how the inter-origin distance and S-phase is lengthened when an N/C ratio threshold is reached at the MBT (Fig. 2.3).

Fig. 2.3
figure 3

Influences on the N/C ratio and chromatin organization during developmental control of replication origin activation. In the pre-MBT stages, replicative helicase loading and its activation is facilitated by an abundance of initiation factors and easy access to chromatin. PP2A might counteract ATR-dependent inhibition of origin firing (red circles). Titration of replication initiation factors contributes to a restriction of origin activation when the N/C ratio (black arrows) reaches a threshold at the MBT. Several mechanisms in post-MBT stages emerge and become dominant, lowering the CDK and DDK activities. Furthermore, epigenetic features modify the activation of the replication program to more restricted sites. Modulation of origin activation during the developmental program can have deleterious effects in origin-poor regions where common fragile sites are located

6.2 Endoreplication and Gene Amplification: Two Modes of Regulated Over-replication During Development

In some particular cases, cells are programmed to switch from a mitotic cycle to rereplicative states, producing polyploid cells. Several strategies are employed to increase ploidy during development in metazoans. The main mechanism in flies and mammals is endoreplication, which in embryos involves multiple S-phases without entering into mitosis (Zielke et al. 2013). The same replication factors are generally engaged in this process. S-phases in endocycling cells from embryos of mice and other metazoans are driven essentially by oscillations in the activity of CDK2-Cyclin E (Geng et al. 2003; Parisi et al. 2003; Tetzlaff et al. 2004; Zielke et al. 2011).

Although favorable to successive rounds of DNA replication, endocycling cells must avoid rereplication . When CDK2-Cyclin E activity is low, high APC/C activity degrades mitotic cyclins as well as Geminin , opening a window of opportunity for licensing to occur. In contrast, high levels of CDK2-Cyclin E activity initiate replication and decrease the action of APC/C (Reber et al. 2006; Keck et al. 2007). APC/C-dependent oscillation of Geminin also appears to be important for endocycles (Zielke et al. 2008). Finally, CDK2-Cyclin E is regulated by the CDK inhibitors Dacapo in flies or p57Kip2 in mice (de Nooij et al. 2000; Hattori et al. 2000; Ullah et al. 2009).

Alternatively, DNA rereplication can be used during development to increase the copy number at a particular locus. This event, termed gene amplification, is required to increase the number of gene copies for a tissue-specific function. Chorion gene amplification during Drosophila oogenesis occurs by repeated activation of selected origins. Acetylation of nucleosomes seems to be required for the selection of specific replication origins in amplified gene regions (McConnell et al. 2012).

7 Human Developmental Diseases Associated with Deregulation of Replication

Defects in resolving DNA replication stress have been described for several human disorders . Multiple mutations in genes encoding DNA replication regulators have been identified in various genetic syndromes often characterized by developmental defects, neurological disorder, and growth retardation, reflecting roles in processes requiring high rates of cell proliferation.

Mutations in pre-RC components Orc1, Orc4, Orc6, Cdt1 , and Cdc6 have been identified in Meier-Gorlin syndrome patients suffering from severe developmental malformations (Bicknell et al. 2011a, b; Guernsey et al. 2011). Mutation of MCM4 was also reported in individuals with chromosome instability (Gineau et al. 2012; Hughes et al. 2012). Mutations of the replication initiation factor RecQ4 are associated with Rothmund-Thomson syndrome (Larizza et al. 2010). Although all these disorders show growth defects suggesting problems in replicating DNA during embryogenesis , the multiple clinical features observed suggest that these proteins regulate other replication-independent functions during development . RecQ4, for instance, contains one helicase domain necessary for DNA repair (Bachrati and Hickson 2008).

Preference in replication origin activation during development and differentiation can cause a serious threat to genome integrity, as described for common fragile sites (CFSs, reviewed in Debatisse et al. 2012). Pathologies associated with CFSs mainly involve DNA repeat expansion over a certain threshold, altering nearby gene expression (Cleary et al. 2002; Voineagu et al. 2009). Fragile X-related disorders , Huntington’s disease, and myotonic dystrophy are human hereditary diseases characterized by such repeat instability. Oogenesis and early embryonic development seem favorable to repeat expansion in contrast to somatic cells. In fragile X syndrome (FRS), expansion of the (CGG)n repeat motif located in the 5′-UTR of the FRM1 gene has recently been linked to the inactivation of an upstream replication origin, whereas in normal human embryonic stem cells (hESCs), this region is replicated by two flanking origins (Gerhardt et al. 2014a). Interestingly, differentiation of FRS-affected hESCs restores the replication program, providing an explanation for the expansions happening mainly during early embryonic development. Substitution of one thymidine by cytosine in the upstream replication initiation site has been proposed as the genetic determinant by which origin activation is silenced (Gerhardt et al. 2014b), but the molecular mechanisms involved still remain to be elucidated.

Finally, rereplication is a source of DNA damage that promotes genome instability , a hallmark of cancer (Hook et al. 2007; Blow and Gillespie 2008). Rereplication activates a DNA damage response, whose consequences mainly depend on the cellular background (Blow and Dutta 2005). During development , rereplication blocks cell cycle progression, leading to embryonic lethality (Hara et al. 2006). While experimental data revealed that chromosome breaks and rearrangements result from rereplication, a direct relationship with tumorigenesis and human cancer has not yet been clearly defined.

8 Conclusions and Perspectives

With more and more genome-wide analyses reported on the nature of replication origins, as well as on the organization of the genome in the cell nucleus, the characteristics of replication origins are progressively becoming unveiled. It now appears clear that they are not set by unique combinations of signatures, but are highly flexible, both in their features and also in their usage during the cell cycle . Activation of metazoan origins shares critical functions with epigenetic controls, probably to adapt or coordinate the organization of replication domains to cell-fate specification and patterning during embryonic development . Adaptation of replication origin activation in response to checkpoint controls and DNA damage, treated in other chapters of this book, is another demonstration of the necessary flexibility of origins for the maintenance of genome integrity. Several questions still remain unanswered. At the structural level, how is DNA organized at a replication origin? Are structural components specifically involved in replication foci and the organization of replication origins along the genome? Do some replication proteins have other functions uncoupled from DNA replication itself? Diseases linked to the regulation of replication origins are becoming revealed, but the expected link with some cancers has yet to be clearly established.

Undoubtedly, initiation of DNA replication is awaiting new exciting discoveries, and its regulation is likely to exhibit new relationships with apparently unrelated domains of biology, as it has to be coordinated with the organization of chromosomes for most aspects of nuclear metabolism.