Studies in our laboratory over the last three decades have shown that the Chinese hamster dihydrofolate reductase (DHFR) origin of replication corresponds to a broad zone of inefficient initiation sites distributed throughout the spacer between the convergently transcribed DHFR and 2BE2121 genes. It is clear from mutational analysis that none of these sites is genetically required for controlling origin activity. However, the integrity of the promoter of the DHFR gene is needed to activate the downstream origin, while the 3′ processing signals prevent invasion and inactivation of the downstream origin by transcription forks. Several other origins in metazoans have been shown to correspond to zones of inefficient sites, while a different subset appears to be similar to the fixed replicators that characterize origins in S. cerevisiae and lower organisms. These observations have led us to suggest a model in which the mammalian genome is dotted with a hierarchy of degenerate, redundant, and inefficient replicators at intervals of a kilobase or less, some of which may have evolved to be highly circumscribed and efficient. The activities of initiation sites are proposed to be largely regulated by local transcription and chromatin architecture. Recently, we and others have devised strategies for identifying active origins on a genome-wide scale in order to define their distributions between fixed and dispersive origin types and to detect relationships among origins, genes, and epigenetic markers. The global pictures emerging are suggestive but far from complete and appear to be plagued by some of the same uncertainties that have led to conflicting views of individual origins in the past (particularly DHFR). In this paper, we will trace the history of origin discovery in mammalian genomes, primarily using the well-studied DHFR origin as a model, because it has been analyzed by nearly every available origin mapping technique in several different laboratories, while many origins have been identified by only one. We will address the strengths and shortcomings of the various methods utilized to identify and characterize origins in complex genomes and will point out how we and others were sometimes led astray by false assumptions and biases, as well as insufficient information. The goal is to help guide future experiments that will provide a truly comprehensive and accurate portrait of origins and their regulation. After all, in the words of George Santayana, “Those who do not learn from history are doomed to repeat it.”
Key wordsmammalian replication origins DHFR 2-D gels nascent strands replication initiation
- AluI elements
Repetitive sequence elements that can be excised from the genome by the enzyme, AluI
- ARS element
Autonomously replicating sequence element
Rich in adenosine monophosphate and thymidine monophosphate
Benzoylated naphthoylated DEAE cellulose
Chinese hamster ovary
- CpG islands
Nested stretches of alternating cytidine and guanidine monophosphates in complex genomes
Dihydrofolate reductase gene
The gene encoding the dnaA initiation protein from Escherichia coli
Encyclopedia of DNA elements
- High C0t DNA
Repetitive DNA from complex genomes
The bacteriophage M13
- OriC and SV40 ori
The origins of replication in the Escherichia coli and SV40 viral chromosomes, respectively
Polymerase chain reaction
32P-labeled deoxycytidine triphosphate
Early models for identifying origins
The now-classic single-molecule fiber autoradiographic experiments of Huberman and Riggs published in 1968 opened up a new era in the mammalian DNA replication field (Huberman and Riggs 1968). Not only did these studies allow a relatively precise measure of fork rates in mammalian cells, they also established that the majority of origins are bidirectional, are spaced 15–300 kb apart, and sometimes fire coordinately in clusters. It is remarkable that the original interpretations of these data have stood the test of time during the 40 years of subsequent biochemical and genetic investigation into the nature of individual origins in eukaryotic organisms ranging from the simple yeasts to mammals. An important philosophical concept was also introduced by the very nomenclature that Huberman and Riggs utilized to describe initiation sites in mammalian cells: they were termed origins to call attention to the fact that their studies did not say anything about whether initiation was directed by required genetic replicators, as defined by Jacob and Brenner (1963).
The obvious drawback of fiber autoradiography, of course, is the anonymity of any given initiation event identified by grain tracks in an autoradiogram and, thus, the inability to learn about origin sequences or their distributions relative to other genetic markers (although modern versions of this approach have partially corrected this situation; Herrick and Bensimon 2009; Schurra and Bensimon 2009). Thus, the challenge has been to localize individual initiation sites along chromosomes biochemically. Following the experimental pathway that led to the identification of Escherichia coli oriC (Yasuda and Hirota 1977; von Meyenburg et al. 1978) and to autonomously replicating sequence (ARS) elements in S. cerevisiae (Chan and Tye 1980; Stinchcomb et al. 1980), attempts were made early on to rescue “replicators” from mammalian chromosomes by tethering genomic restriction fragments to a selectable marker, transfecting into a suitable host cell, and assaying for high-frequency transformation (Roth et al. 1983; Holst et al. 1988; Krysan et al. 1989; Heinzel et al. 1991). While certain sequences may, in fact, replicate better than others when examined on an individual basis (e.g., Frappier and Zannis-Hadjopoulos 1987; Leffak and James 1989), this approach has not been successful in identifying a comprehensive subset of genomic restriction fragments that contain bona fide replicators. In hindsight, this is probably the consequence of the very large number of potential initiation sites with only modest replicator activity that probably populate any restriction fragment over a certain size (see below).
The problem with using this scheme to identify initiation sites in mammalian genomes, of course, is one of complexity. OriC is the only initiation site in the E. coli genome under most circumstances, and its ∼4 × 106 bp genome contains only ∼103 restriction fragments. Hence, a readable pattern could be deduced from the few dozen origin-proximal fragments that were labeled in the first few minutes after release from the dnaA ts block. However, the typical diploid mammalian genome probably contains >5 × 104 start sites distributed among 6 × 109 bp of sequence. Regardless of the degree of synchrony that might be achieved with mammalian cultured cells, no conceivable labeling strategy could sort out the complex labeling pattern of several thousand origins firing almost simultaneously as a mammalian cell enters S phase.
Strategies for localizing initiation sites in complex genomes
In the early 1980s, our laboratory took advantage of the fortunate congruence of three important features of Chinese hamster cells in order to identify and characterize the first mammalian origin of replication: (1) their ability to be arrested in a non-S-phase condition (i.e., G0) in response to isoleucine deprivation (Tobey 1973), thus allowing subsequent collection at the G1/S boundary after release from the G0 block into a replication inhibitor (Hamlin and Pardee 1976); (2) their ability to amplify a relatively circumscribed region of the genome (the dihydrofolate reductase (DHFR) domain) in response to increasing concentrations of the competitive inhibitor, methotrexate (Biedler and Spengler 1976); and (3) the availability of several interesting variants developed in other labs that subsequently allowed genetic analysis of elements required to effect initiation in the locus (Urlaub et al. 1986; Jin et al. 1995). We began our origin studies by developing a Chinese hamster ovary (CHO) cell line (CHOC400) that had amplified a 240-kb region encompassing the DHFR gene ∼800 times, with the consequence that the 50 or so restriction fragments within the repeating amplicon could actually be visualized against the background of single-copy fragments in agarose gel separations (Milbrandt et al. 1981). These amplified repeats were analogous to a high copy number plasmid, except that the multiple copies were stably arrayed in tandem in the bodies of chromosomes.
Identifying early-labeling fragments as cells enter S phase
The fact that CHOC400 cells could be so easily synchronized allowed the selective labeling of the earliest-replicating restriction fragments (ELFs) in the repeating unit by brief release from a G1/S aphidicolin or hydroxyurea block into 14C-thymidine (Heintz and Hamlin 1982). About a half-dozen bands were identified by this protocol, of which a 6-kb band was the most prominent, followed by an 11-kb fragment. Essentially, the same ELFs were detected when cells were released from a G0 block into medium containing 14C-thymidine and the leaky chain terminator, cytosine arabinoside, to stall replication forks very close to initiation sites (Heintz and Hamlin 1983a, b). Thus, we had approximately recapitulated the experiments that mapped oriC to only a few neighboring restriction fragments in the E. coli genome (Marsh and Worcel 1977).
[Note that henceforward, we will italicize the important messages that we feel should be considered in designing origin-finding schemes in future.]
The limitation of this in vivo labeling approach was the difficulty of discerning which of the ELFs was labeled most intensely (i.e., first), since the background labeling from thousands of single-copy, early-firing origins precluded quantification of relative specific activities. In addition, subsequent mapping studies showed that there were actually two adjacent 6-kb EcoRI fragments in this region, the most 5′ of which was excluded from cosmid S21 (Fig. 2; Heintz et al. 1983). Therefore, we did not know whether the 6-kb band was the most highly labeled because it represented two adjacent fragments or because one member replicated first and therefore must contain the origin.
To address this problem, an independent study devised a different strategy in which early-replicating DNA was radiolabeled either in vivo or in vitro in the first few minutes of S phase, and the sheared, labeled DNA was hybridized to a series of similarly sized XbaI fragments from the intergenic spacer that encompassed the two 6-kb ELFs and a few flanking fragments, but not the region flanking the 11-kb ELF (Burhans et al. 1986). When relative labeling was compared among these fragments, the XbaI fragment most closely corresponding to the upstream 6-kb ELF (termed R1-F′) was the most active. Thus, it was concluded that the origin in this locus was confined to a limited region of the intergenic spacer encompassed by R1-F′. As we will see, later studies proved this conclusion to be essentially incorrect.
In an attempt to quantify the actual in vivo labeling pattern in the intergenic spacer as a whole, we adapted a clever in-gel renaturation technique developed by Roninson (1983) in which a restriction digest of early-labeled DNA was separated on an agarose gel, and the gel was then subjected to multiple cycles of denaturation and renaturation, the latter for a time long enough to allow only amplified sequences to reanneal (Leu and Hamlin 1989). With the background labeling reduced essentially to zero by this procedure, densitometry allowed an estimate of labeling intensities relative to labeled log-phase DNA. Once again, the 6-kb doublet was the most highly labeled, and the 11-kb ELF was only somewhat less so. By digesting the genomic DNA with an enzyme combination (BamH1 and HindIII) that reduced the average fragment length, a much higher-resolution picture of the labeling pattern in the spacer region was obtained, which confirmed the suggestion that the upstream 6-kb fragment (R1-F′) was somewhat more intensely labeled than its 6-kb neighbor, R1-F (Burhans et al. 1986). Importantly, however, this analysis clearly showed that the peak at R1-F′ was approximately equal to that of the 11-kb ELF lying ∼22-kb downstream (Fig. 2, dashed curve). The two peaks were named ori-β and ori-γ (a contemporaneous study on a larger DHFR amplicon in another cell line uncovered an additional origin (ori-α) about 200 kb upstream (Ma et al. 1990)).
Unfortunately, the second peak in the region of the 11-kb ELF could not have been detected in the earlier ELF hybridization study (Burhans et al. 1986), since the target clones in that study did not encompass most of this fragment. This omission ultimately contributed to the emerging powerful notion that R1-F′ contained the major initiation site in the DHFR locus. As we will see, we also misinterpreted our own data by assuming that the pyramidal shape of the peaks detected in the in-gel studies resulted from fixed start sites that fired at slightly different times in different cells in the population owing to imperfect synchrony (as illustrated in Fig. 1). This misconception arose directly from the similar patterns of labeling between our data and that obtained in the comparable experiment on E. coli, whose origin is fixed (Marsh and Worcel 1977). In addition, because the fragments between and flanking the 6- and 11-kb EcoRI fragments were all small, thus rendering their signals relatively weak, there was no reason to believe that these fragments contained initiation sites.
The presence of two origins in the spacer (ori-β and ori-γ) was further suggested by an independent study in which CHOC400 cells were treated at the G1/S border with psoralen to cross-link the double strands together every 10 kb or so, followed by release into BrdU or 3H-thymidine for a brief interval to label just the DNA close to origins (Anachkova and Hamlin 1989). The 6- and 11-kb ELFs were again shown to be the most highly labeled as cells entered S phase. Interestingly, the 6-kb ori-β-containing fragment was subsequently shown to contain a highly repetitive AluI element (Leu et al. 1990).
Thus, it was beginning to look like origins in mammalian cells might correspond to specific sites (and potentially to replicators). The presence of a repetitive element in the ori-β locus even suggested that at least some origins might share important sequences, although no AluI elements were present in the equally active ori-γ locus. The finding of an AluI element in R1-F′ presumably did not affect the in vivo labeling patterns of fragments separated by their unique sizes in gels (Heintz and Hamlin 1982; Leu and Hamlin 1989). However, the presence of the repeated element in or near ori-β obviously clouded studies in which early-labeled DNA was used as a hybridization probe on cloned sequences from the region (Burhans et al. 1986; Anachkova and Hamlin 1989), since the labeled AluI elements could have come from early-firing origins elsewhere in the genome. It would clearly be difficult to correct hybridization intensities for AluI elements, since using high C0t DNA to block them from the probe totally eliminated hybridization to R1-F′, probably by forming networks with adjacent unique sequences in the region (Anachkova and Hamlin 1989). Thus, it is likely that the presence of an AluI element in R1-F′ was partially responsible for this fragment appearing to be the most active in the region.
For reasons that are difficult to fully understand even now, from this time onward, it was assumed by almost all but ourselves that ori-β was the most active site in the DHFR locus and therefore probably corresponded to a replicator. Ori-γ was essentially ignored for years, even though we now thought of it as the second origin in the region. As we will see, even this was a misinterpretation and oversimplification of our own data.
Identifying origins by localizing the positions at which leading and lagging strands switch templates
Meanwhile, others were devising their own in vivo labeling schemes to follow replication forks (and potentially identify initiation sites) in mammalian genomes. Two potentially powerful approaches appeared in 1989 and 1990, both of which were initially tested on the Chinese hamster DHFR locus. These methods rely on the fact that leading and lagging strands switch templates at initiation sites (Fig. 1). The first technique (Roufa and Marchionni 1982) depended on an older suggestion that nucleosomes are distributed only to the leading strand at the replication fork (Seidman et al. 1979). Hence, partial inhibition of protein (and thereby histone) synthesis with emetine in the presence of BrdU should allow nucleosome-free DNA to accumulate on the slowly elongating, BrdU-labeled retrograde arm of each fork. This should subsequently be degraded preferentially with micrococcal nuclease, leaving the nucleosome-protected 145 bp of DNA on the leading strand largely intact. Isolation of heavy single-stranded DNA and hybridization to the plus and minus strands of M13 clones encompassing a region of interest should then indicate where the template switches and therefore the position of the origin (Fig. 1).
In fact, in the first application to a mammalian cell line (Handeli et al. 1989), the DHFR origin was used as the positive control, with the expectation that the switch would occur at ori-β. Instead, two switches were observed—one at ori-β and one encompassing ori-γ. Since our lab was in the act of publishing the high-resolution in-gel renaturation experiments identifying these two preferred early-labeled regions, this study constituted a critical independent confirmation of the existence of these two regions as likely to contain origins of replication. The theoretical power of this technique is that it should allow an initiation site to be mapped within a few nucleosomes of the start site if it is fixed and efficient. In fact, the primary data painted a much less clear picture: the degree of bias varied substantially across the ∼35-kb core region of the DHFR intergenic region, but nowhere was it as great as in the DHFR gene itself. It turns out that the variable and underwhelming biases were probably a mixture of reality (see below) and the inherent lack of precision of the method itself, neither of which could have been appreciated by the authors or by us at the time.
The second fork direction method was based on the original experiments that were used to demonstrate the presence of RNA primers on the 5′ ends of leading and lagging nascent strands, and subsequently to map the sites of initiation around E. coli oriC and SV40 ori, respectively (Okazaki et al. 1975; Kurosawa et al. 1975; Hay and DePamphilis 1982). In this method, the 5′ ends of RNA-primed leading and lagging strands are selectively labeled with 32P after exposure of the 5′-OH at the RNA–DNA primer junctions by base treatment. The labeled DNA is then utilized to probe the separated strands of the region of interest to localize the template switch and, by inference, the approximate start site. When a 4-kb region encompassing ori-β in the Chinese hamster DHFR locus was analyzed by this lagging strand assay, quantitative analysis suggested that more than 80% of the Okazaki fragments switched templates within this region (Burhans et al. 1990). The same result was observed whether the Okazaki fragments were isolated in early S phase when the origin is firing or from unsynchronized cell populations. It was concluded that ori-β represented the major initiation site in the intergenic spacer, even though the rest of the intergenic spacer was not sampled.
Thus, it appeared that this method could give a very high-resolution picture of replication start sites in defined regions of complex genomes and that mammalian origins might be just as efficient as, say, oriC. This would require, however, that ori-γ not be active and, if it were, something would have to prevent forks from entering ori-β in early S phase (e.g., a replication fork block). The origin also would have to be almost 100% efficient to explain why there was such a large bias in template usage at ori-β in log-phase cells. However, we already had very good evidence that initiation occurs in this locus probably in fewer than 30% of cell cycles, so that passive replication was occurring in the other 70%. In addition, we could not repeat these results in our own laboratory in a later study—even in early S-phase cells (Wang et al. 1998). At the present time, we still cannot fully reconcile these two studies with each other or with later studies described below.
There was another critical outcome of the latter two studies. The laboratories that initiated this pair of “lagging” strand assays discovered that they were actually observing preferential hybridization of their preparations to the opposite strands in the DHFR locus. It was later shown in a careful analysis that emetine actually allows preferential incorporation of BrdU into the leading strand because it inhibits Okazaki strand synthesis (Burhans et al. 1991). Thus, overnight, this “lagging strand” assay became a “leading strand” assay. Ultimately, results of the two methods were shown to be somewhat concordant with each other, although the biases observed in the leading strand assay (Handeli et al. 1989) never approached those suggested by its complement (Burhans et al. 1990).
Localizing origins by identifying the positions of small nascent strands along the template
Concurrent with this large set of somewhat different labeling studies was the development of an approach that could potentially allow the isolation of a comprehensive set of origin-centered DNA with relatively high resolution and, importantly, from diploid loci. The diagram in Fig. 1 points out that nascent strands are approximately centered over the replication initiation site. Were the smallest origin-centered fragments to be released and purified away from the template, their approximate copy numbers at different positions in a region could be quantified using polymerase chain reaction (PCR). Two approaches have been used to release nascent strands: (1) branch migration to yield double-stranded intermediates (Kaufmann et al. 1985) and (2) heating well above the melting temperature to release BrdU-labeled single-stranded nascent strands from the template (Vassilev and Johnson 1989; Vassilev et al. 1990).
We will focus on the second method since the Chinese hamster DHFR locus was among the first mammalian origins to be analyzed by this approach (Vassilev et al. 1990). Log-phase cells were pulse-labeled with BrdU; nascent strands were separated from the template by alkaline denaturation and were size-fractionated on alkaline sucrose gradients. Anti-BrdU was utilized to isolate the nascent strands, and selected regions of the ∼4-kb region encompassing ori-β were amplified by PCR using appropriate primers. The products were then quantified by utilizing 32P-labeled cognate oligonucleotides to probe dot blots of the PCR products. The resulting data showed a clear enrichment of sequences encompassing the ori-β region as identified in previous studies (Burhans et al. 1986; Anachkova and Hamlin 1989; Leu and Hamlin 1989; Burhans et al. 1990; Leu et al. 1990). Thus, the argument that ori-β was the major start site in the region was once again bolstered, even though the region encompassing ori-γ and the rest of the spacer was not examined.
Two-dimensional gel analysis suggests a noncanonical mode of initiation in the DHFR and other origins of replication
While the nascent strand assay was being improved in other labs, two powerful two-dimensional (2-D) gel replicon mapping techniques came on the scene in 1987, the application of which completely changed our understanding of the DHFR origin and seemed to align almost all of the data from our own and other labs into one unified picture. These two methods provide a very comprehensive view of replication fork movement that simply cannot be obtained by any of the methods discussed previously. In our view, these two methods represent the gold standards for replicon mapping—primarily because they are by definition more than one dimensional.
The first method depends on the different hydrodynamic properties of restriction fragments in an electric field depending on whether they are linear or contain internal bubbles (start sites), single forks, or X-shaped termination structures (Brewer and Fangman 1987). The method was developed to determine whether ARS elements from S. cerevisiae behave as origins in the chromosome, but we were able to adapt it to the DHFR locus in CHOC400 cells owing to its high copy number (Vaughn et al. 1990a). The challenge was to retain fragile replication intermediates intact, as the very long mammalian chromosomal DNA fibers are inordinately subject to shear. In our initial studies, DNA from early S phase or log cultures was cross-linked in vivo with psoralen, purified by standard extraction methods, and digested with a restriction enzyme. Alternatively, we took advantage of the observation that DNA remains supercoiled and attached to an insoluble proteinaceous matrix after histones and other nonhistone chromosomal proteins are extracted from nuclei with isotonic LIS (Mirkovitch et al. 1984). These DNA/halo structures then were digested to completion with a restriction enzyme and the replication intermediates enriched by adsorption to benzoylated naphthoylated diethylaminoethyl (BND) cellulose (Vaughn et al. 1990a). The resulting digests were separated by neutral/neutral 2-D gel electrophoresis (Brewer and Fangman 1987) and transferred to membranes.
Surprisingly, hybridization with probes from the intergenic spacer detected faint bubble arcs not only in the fragments containing ori-β and ori-γ but in several other fragments within the spacer, regardless of the method utilized to purify the DNA (Vaughn et al. 1990a; Fig. 2b). In addition, however, the bubble arcs were accompanied by much stronger single fork arcs, indicating that they were often replicated passively. Since it was shown that forks from upstream or downstream origins had not yet entered the locus at this early time in S phase, we reasoned that the single forks must have arisen from neighboring fragments within the intergenic spacer itself. Importantly, no bubbles were observed in the DHFR gene, regardless of the position in S phase and consistent with all of our other studies. Since we also detected faint bubble arcs in fragments isolated from log-phase cultures, it was unlikely that the synchronization protocol itself generated this unusual initiation pattern (Vaughn et al. 1990a; Dijkwel et al. 1994).
In the same study, we adapted a second powerful neutral/alkaline 2-D gel technique that was also developed to study yeast origins in chromosomes (Nawotka and Huberman 1988). The method essentially measures fork direction in any given fragment for which suitable probes can be developed. The first dimension gel again separates replication intermediates by size; however, the lane is then excised, soaked in alkali, turned through 90°, and nascent strands released from their templates by electrophoresis in alkali. After transfer to a membrane, hybridization with small probes reveals the size and origin of the small nascent strands. If a probe centered in a fragment detects nascent strands of all sizes, there must have been initiations at the position of the probe. If an end probe detects a complete nascent strand arc, then single forks must be entering that fragment from outside. If probes from each end detect complete nascent arcs, then the fragment is being replicated passively from both directions.
These three outcomes are exactly what we observed in several fragments from the intergenic region, which indicated that not only were initiations occurring in the middle of many fragments but that the same fragments were also replicated passively in many of the cells by forks entering from either side. Again, all this was happening early in S phase before forks could be entering the region from distant origins.
There were three general critical reactions to this study both from our own and other laboratories: (1) the matrix-enrichment scheme used to purify intermediates for analysis on 2-D gels could possibly influence the results; (2) bubble- and single-fork-containing fragments might migrate differently than their bona fide counterparts detected in ARS elements in yeast; and (3) amplification itself somehow might have deranged the normal initiation reaction in the DHFR locus. We had essentially ruled out the first proposition by demonstrating that there is no bias for or against any particular type of replication intermediate in the matrix-enrichment scheme (Vaughn et al. 1990a, b; Dijkwel et al. 1991). To address the second possibility, yeast and CHO restriction digests were mixed and run together in the same neutral/neutral 2-D gel (Brun et al. 1995). Probes were selected for initiating fragments of the identical size in the two digests and were labeled with different-colored fluors. When the signals for each of the bubble-containing fragments on the blots were merged, they were clearly identical in shape. Likewise, the single-fork arc in the CHO DNA was coincident with that of a nonorigin fragment from yeast.
To address the third possibility (derangements resulting from amplification), it was clear that we needed to perform 2-D gel analysis on the diploid DHFR locus in CHO cells, which required at least a ∼300-fold increase in sensitivity and a similar increase in the purity of the replication intermediates. We were able to take advantage of the observation that replication forks selectively and quantitatively partition with the 8–10% of the DNA that remains attached to the nuclear matrix after digestion of matrix/DNA halo structures with a six-mer restriction enzyme (Vaughn et al. 1990b). Subsequent chromatography over BND cellulose provided an additional 10–15-fold enrichment, resulting in sufficiently pure preparations to allow analysis on 2-D gels (Dijkwel et al. 1991).
When the single-copy DHFR locus in early S-phase CHO cells was analyzed on neutral/neutral 2-D gels (Dijkwel and Hamlin 1995), the result was virtually identical to that obtained with CHOC400 cells: every restriction fragment within the intergenic spacer demonstrated a complete bubble arc along with a more prominent single-fork arc—a characteristic we have come to recognize as the signature of fragments residing in initiation zones in the early S phase. This same composite pattern also was observed in the ori-β-containing fragment in log-phase cells. Furthermore, neutral/alkaline 2-D gel analysis showed that forks move in both directions in the intergenic region but only outward through the DHFR gene. Thus, we could conclude that the delocalized initiation mode observed in CHOC400 cells was not the consequence of amplification per se.
Importantly, both this study and earlier ones on CHOC400 cells showed that initiation occurred in only a fraction of cell cycles, since single-fork arcs indicative of passive replication persisted in the intergenic region hours after initiation in the locus ceased (Vaughn et al. 1990b; Dijkwel and Hamlin 1992, 1995). Thus, it appears that this origin is very inefficient overall, probably firing in fewer than 30% of cell cycles, based on the fact that single forks persist in the locus for 6 h even though initiation ceases after about 2 h. Studies summarized below suggest that this may be true for most mammalian origins. It was later shown that this same dispersive initiation pattern characterizes the rhodopsin origin in CHO cells (Dijkwel et al. 2000), and 2-D gel approaches demonstrated the same characteristics for the rDNA origin residing in the nontranscribed spacer of human cells (Little et al. 1993). Thus, the term “initiation zone” haltingly became a part of the replication lexicon.
Cell synchronizing regimens become an issue
I learned several years ago that if one released CHO cells from a G0 block into hydroxyurea (HU), an inhibitor of ribonucleotide reductase, for times longer than about 10 h, the cells slowly leaked into the S period (Hamlin and Pardee 1976; J.L. Hamlin, unpublished observations). This made sense because HU is not expected to prevent origins from firing but only to slow replication forks as they moved away from their start sites. Several years later, HU was generally replaced in mammalian cell synchronization protocols with aphidicolin, which is a deoxycytidine analog that competitively inhibits several DNA polymerases (see Huberman 1981 for an early review). Although aphidicolin gives a reasonably tight block as cells enter S phase, it is also somewhat leaky: relatively insensitive fluorescence-activated cell sorter analysis suggests that cells are still lined up at the G1/S boundary when collected with aphidicolin, but the intense scrutiny of neutral/neutral 2-D gel analysis on the DHFR origin showed that replication forks had actually traveled considerable distances even after only 12 h in this drug (Vaughn et al. 1990a; Mosca et al. 1992).
About this time, the plant amino acid, mimosine, was discovered to have effects on the mammalian cell cycle (Lalande 1990), and we showed with 2-D gels that it essentially prevents replication fork formation at origins (Dijkwel and Hamlin 1992; Mosca et al. 1992). Although there is still uncertainty as to how this inhibitor actually works (Mosca et al. 1992; Gilbert et al. 1995), the fact that it synchronizes an advancing G1 population prior to chain elongation was instrumental in our being able to detect bubble arcs in single-copy loci.
These observations raise the question of whether sequences that are radiolabeled as cells enter S phase after a lengthy aphidicolin or HU block are really representative of the earliest-replicating sequences in undisturbed cultures (Heintz and Hamlin 1982; Tribioli et al. 1987). Indeed, a prodigious amount of new information suggests that cells activate checkpoints when established forks are slowed (reviewed in Harper and Elledge 2007), and a single-molecule fiber fluorescent in situ hybridization analysis showed clearly that inchoate origins can be activated (or the timing changed) when nucleotide pools are lowered in S-phase cells (Anglana et al. 2003). These phenomena could figure importantly in double block synchronizing protocols required for cell types that cannot be arrested in G0 by nutritional or serum starvation because the first step is to add HU or aphidicolin directly to asynchronous cells, many of which are in S phase.
Results from nascent strand abundance and 2-D gel assays begin to converge—but not meet
During the next few years, the small nascent strand abundance assay was improved to correct for differences in PCR amplification efficiencies among different sequences (Pelizon et al. 1996): the PCR reactions were spiked with known amounts of cloned sequences corresponding to each region being queried, but to which an additional 20 bp of unrelated sequence was added in order to compare nascent strand concentrations to that of these standards. In a reanalysis of the DHFR origin from log-phase CHO cells (again focusing only on the ori-β region), the results using this improvement were essentially in agreement with the earlier published work (Vassilev et al. 1990).
About the same time, an extremely interesting system was developed in which nuclei purified from CHOC400 cells in late-G1 (but prior to S-phase entry) were shown to initiate replication (apparently faithfully) when introduced into an in vitro replication cocktail prepared with Xenopus laevis egg extracts (Gilbert et al. 1995). This hybrid system allowed early-replicating regions of the genome to be labeled to very high specific activity with 32P-labeled dNTPs. When this labeled material was utilized to probe an extensive series of single-copy cloned fragments from the DHFR intergenic region, the results again indicated that the ori-β region was the most intensely labeled. However, this series of probes again excluded the entire 30-kb region encompassing the ori-γ locus, bolstering the false assumption that ori-β contained the major start site in the region.
A few years later, it was realized by many that, regardless of the precision of the measurements, BrdU labeling created an AT-rich bias in the nascent DNA that was being analyzed. Without it, however, there would be nothing to distinguish nascent DNA from small broken template DNA of the same size, which creates the major chemical mass in the sucrose gradient fractions. The alkaline conditions used to melt and subsequently separate the DNA was obviously also contributing to breakage. In a real tour-de-force, a method was developed in which λ-exonuclease was used to digest away small DNA resulting from shear, while leaving the RNA-primed nascent strands intact (Bielinsky and Gerbi 1998). The method was perfected on ARS1 from S. cerevisiae and yielded such a clean product in their hands that it was even possible to map the positions of the RNA–DNA transitions at start sites to the base pair, as had been accomplished previously for SV40 (Hay and DePamphilis 1982). In a recent application to a mammalian origin, the RNA–DNA transitions at start sites were also mapped to the base pair in the human lamin B2 origin (Abdurashidova et al. 2000). This origin was discovered in an early-labeling strategy (Tribioli et al. 1987) and subsequently fine-mapped by an earlier version of the nascent strand abundance assay (Giacca et al. 1994). This is one of the few examples of a mammalian origin that appears to behave like the replicators of simple genomes. Interestingly, lamin B2 resides in a very narrow intergenic region; therefore, it may be able to accommodate only one of the prereplication complexes required to effect initiation. In fact, certain mutated versions of a short sequence encompassing this locus do initiate less efficiently when tested at ectopic positions, which supports the replicator concept for this locus (Paixao et al. 2004).
Inspired by this apparent improvement, a more extensive analysis was repeated on the 12-kb region encompassing both of the 6-kb EcoRI ELFs that were originally identified in the DHFR locus (Burhans et al. 1986; Leu and Hamlin 1989) but once again excluding the region to the right encompassing ori-γ. A much larger number of closely spaced primer pairs were utilized in competitive PCR assays on nascent early S-phase DNA prepared by BrdU labeling and immunoprecipitation or without BrdU but including the λ-exonuclease treatment (Kobayashi et al. 1998). In fact, in this report, there was precious little difference in the apparent distribution of nascent strands obtained by the two methods. The important finding, however, was that focusing on this extended region now illuminated two peaks—one corresponding to ori-β and one to its immediate right, which was termed ori-β′. As it turns out, each peak resides in one of the two 6-kb EcoR1 ELFs.
In hindsight, the focus on this 12-kb region as opposed to the entire 55-kb spacer is understandable, given the emotional bias toward the 6-kb ELFs, as well as the huge number of competitor clones and primer sets that would have been required to scan the whole locus. The latter problem has been partially alleviated with the advent of real-time PCR, which is now used in place of the competitive PCR reaction (e.g., Wang et al. 2004).
We realized that had the whole intergenic region been analyzed with such precision other initiation sites might be detected (ori-γ being one of them). Since elaborating all the necessary primer pairs would have been a formidable task, we labeled nascent strands in vitro with 32P-dCTP after releasing CHOC400 cells from a G1/S mimosine block, isolated the 300–1,000-nt labeled nascent strands on gels, and hybridized them to dot blots of 22 small single-copy sequences spread across the intergenic region and separated from one another by more than 1,500 bp.
The pattern we saw was striking (Fig. 2, solid curve; Dijkwel et al. 2002). The ori-β and ori-γ regions were clearly the most highly labeled, but a shoulder appeared on the right side of the ori-β peak, which coincides with the ori-β′ peak detected in the independent study (Kobayashi et al. 1998). The central region of the spacer appeared to be devoid of nascent strands, in agreement with the pattern suggested by the earlier in-gel renaturation ELF studies (Leu and Hamlin 1989). Importantly, this experiment showed that under these circumstances, initiation must be occurring at least at each of the 22 positions represented by the dot-blotted fragments, since the distances between these fragments are greater than the size of the nascent strand probe itself (300–1,000 nt). In our minds, these data constitute one of the most compelling arguments that the DHFR origin corresponds to an initiation zone. Using a very similar strategy but with a large number of probes, the same conclusion was arrived at in a different laboratory (Sasaki et al. 2006).
Some aspects of the small nascent strand abundance assay are worth considering, particularly their numbers in a log-phase culture at any instant and the difficulty of purifying them away from template. If the average fork rate is ∼2 kb a minute, then any region encompassing an initiation site will contain nascent strands 2 knt in length for only ∼0.5 min. If the cell cycle time is 20 h (1,200 min), then only 1/2,400 of the DNA corresponding to this 2-kb region will represent nascent strands, with the vast majority corresponding to nonreplicating template with exactly the same chemical properties. Typically, one starts with 108 cells, which contain 2 × 108 copies of the locus in question, of which 8 × 104 (1/2400th) will be nascent. If there are ∼5 × 104 origins in the cell, then a saturating preparation of nascent strands for all sites would correspond to 4 × 109 2 knt nascent strands. However, our current whole-genome studies indicate that at least half of the origins are zones (Mesner et al. submitted). Thus, assuming that the average zone might have at least three independent start sites, the total number of nascent strands would be 2 × 109 + (2 × 109) (3), which equals about 8 × 109 nascent strands total. With an expected efficiency of ∼50%, this would amount to 5–10 ng of purified nascent DNA starting with 600 μg from 108 cells—a formidable purification obstacle.
In early incarnations, “nascent strands” were obtained after a BrdU pulse, followed only by heating or treatment with alkali and a sizing step on sucrose gradients. Eliminating the BrdU pulse and introducing λ-exonuclease in some laboratories can apparently provide a cleaner preparation, although this enzyme greatly prefers double-stranded templates for full activity. Even in the most recent studies, the variation in yields among samples prepared in the same lab from the same cells can be as much as sixfold (Cadoret et al. 2008). Thus, the degrees of purity can be quite variable. Rarely is there an attempt to analyze the exact composition of the material that is considered to be “nascent” except by searching for the presence of sites that were identified in material purified by the very same method (e.g., lamin B2 and c-myc).
Trying to reconcile data from different views of the DHFR origin
Needless to say, for those not working directly on the DHFR locus, it was extremely difficult to sort out the conflicting pictures being painted by the various laboratories working on this locus. The early in-gel renaturation studies (Leu and Hamlin 1989), the in vitro ELF data (Dijkwel et al. 2002), and the 2-D gel analyses (Vaughn et al. 1990a; Dijkwel and Hamlin 1992) all indicated that there were at least two preferred initiation regions (ori-β and ori-γ) within the intergenic spacer, not just one at ori-β. Rather than corresponding to fixed sites, these regions appeared to be just somewhat preferred over many other sites scattered throughout the spacer, with a relative dead zone approximately in its center (Leu and Hamlin 1989; Dijkwel et al. 2002). Only in the most recent nascent strand abundance approach from another laboratory was it suggested that there was more than one preferred start site (ori-β and ori-β′ at least; Kobayashi et al. 1998), and the title of this paper now used the moniker initiation zone to describe the DHFR origin.
Even these data were not entirely in agreement with 2-D gel analyses. The region between ori-β and ori-β′ appeared to be devoid of small nascent strands in the most extensive small nascent strand analysis (Kobayashi et al. 1998), again reinforcing the concept that the two corresponded to fixed sites and therefore probably replicators. However, we were able to demonstrate that a small fragment straddling this presumed negative region actually displays a bubble arc on 2-D gels (Dijkwel et al. 2002). If bubble-containing fragments give rise to small nascent strands, as most of us assume, this result suggests that perhaps the small nascent strand preparations are not saturating. If that were the case, then it would not be surprising to find them populated only with the most efficient initiation sites. Importantly, no amount of PCR amplification could detect the less efficient start sites if they are not present in the preparation, although, in theory, they could constitute the majority of start sites in the genome.
A few other points are worth making here. Since none of the nascent strand preparations that we and others have utilized to localize start sites are very pure, one normally utilizes total genomic DNA isolated from log-phase cells as a normalization standard. Since early-replicating loci are overrepresented in this kind of sample, all sequences in this standard are not at equal copy number (although the variance is not huge). Another extremely important observation was made recently: there is an apparent futile replication cycle of double-stranded DNA sequences, which results in amplification of the template as much as 40–50-fold (Gomez and Antequera 2008). This material is ∼200 bp long, is RNA-primed, and is highly enriched in CpG islands, but there is no solid evidence that any of this material ever matures into finished replicons. Because this very abundant material sediments dangerously close to the fractions that are selected as “small nascent strands” in sucrose gradients, the possibility arises that a major contaminant of some nascent strand preparations may not be legitimate.
Thus, what we all call nascent strands could be significantly contaminated with broken, nonnascent DNA, could fail to be saturating, and could even include CpG-island-enriched DNA that really has nothing to do with true origins.
Genetic analyses identify required cis-elements
The ability to perform 2-D gel analysis on diploid loci in mammalian cells marked an important shift in emphasis in our laboratory, as we no longer had to rely on the amplified cell line to analyze this complex origin. This opened the way for genetic studies on the single-copy locus that ultimately allowed a search for any genetic elements that are required to effect initiation in the spacer. Owing to the availability of CHO variants that had only a single DHFR locus, as well as several dhfr-deficient derivatives that were missing parts of the gene, we were able to develop a strategy for restoring the gene to wild type while introducing small deletions into the downstream origin (Kalejta et al. 1998).
We first addressed the possibility that, although initiation occurs at an array of sites in the spacer in different cells in the population, ori-β, ori-β′, or ori-γ might actually correspond to true replicators. One would only have to assume that, after melting the helix at these sites, a helicase migrates back and forth in a random walk before the RNA primer is loaded (i.e., initiation and chain elongation are uncoupled in time). To address this possibility, we first deleted ori-β, then a combination of ori-β and ori-β′, and finally the entire 45 kb that encompasses >90% of the start sites in the 55-kb intergenic spacer (Kalejta et al. 1998; Mesner et al. 2003). The results of this analysis were striking: there was no discernible effect of deleting ori-β or ori-β and ori-β′, on initiation in the rest of the spacer, as judged by 2-D gel analysis or by a replication timing assay. Surprisingly, even when the central 45 kb was deleted, initiation in the truncated 10-kb spacer actually increased in efficiency, with the consequence that the locus still initiated and replicated in early S phase (Mesner et al. 2003). These experiments showed clearly that there are no critical nonredundant elements in the DHFR origin itself that are required to effect initiation.
The question then arose whether any cis-regulatory elements are necessary for maximum origin activity. It turns out that sequences in the promoter of the DHFR gene that are required for transcription are also required to enhance the downstream origin since, in their absence, the overall efficiency of the origin decreases (Saha et al. 2004). Amazingly, however, the promoter deletion allowed the initiation zone to spread into the body of the now-inactive gene. This important observation showed clearly that even the bodies of genes contain inchoate initiation sites that are silenced by read-through transcription. This view was reinforced by the observation that deletion of the 3′ termination signals of the DHFR gene, which allowed transcription of all but 8 kb of the intergenic spacer, completely prevented this region from initiating replication (Mesner and Hamlin 2005). In the remaining 8-kb intergenic region near the end of the 2BE2121 gene, the efficiency of initiation was again greatly increased over that of the same sequences in the wild-type arrangement, presumably because some limiting factors required for initiation are now distributed over a much smaller number of potential initiation sites. Finally, we recently demonstrated that fragments containing either ori-β, ori-β′, or a fragment from the body of the DHFR gene all initiate with approximately the same efficiency when inserted at ectopic anonymous positions, in the CHO genome (Lin et al. 2005). This is in spite of the fact that the gene fragment never initiates at its native locus.
Even these relatively straightforward results have not deterred other laboratories from attempting to determine the sequence elements in ori-β that render it an efficient site relative to other sites in the spacer (although we have calculated that the 4-kb region encompassing it fires in fewer than 5% of cell cycles in a log population; J.L. Hamlin, unpublished). The approach has been to introduce small deletions into a cloned fragment encompassing ori-β to insert it either at random anonymous or specific ectopic positions in a human cell line and then to use the small nascent strand abundance assay to measure activities relative to some local marker adjudged to be initiation negative (Altman and Fanning 2001, 2004; Gray et al. 2007). There are, indeed, small sequences that can be deleted from ori-β that lower its efficiency, suggesting that they may help to attract the initiation machinery to this site in vivo. Given that none of the sites in the spacer are essential (including ori-β), we suggest that the old idea of replicators may not pertain to sites within initiation zones. As mentioned earlier, the lamin B2 origin appears to require certain sequences for full activity (Paixao et al. 2004), as do the c-myc and β-globin initiation zones (Liu et al. 2003; Wang et al. 2004).
Testing this general model at the genome-wide level
All of these studies on the DHFR locus have led us to a model in which the entire mammalian genome is peppered with a hierarchy of degenerate initiation sites, the use of which is governed by local transcription both positively and negatively. We have used the analogy to DNAse-hypersensitive sites in naked DNA, some of which are clearly preferred over other sites, but we do not consider DNAseI or micrococcal nuclease to be site-specific endonucleases (Hamlin et al. 2008). Of course, this model arose largely from our studies on the DHFR locus and departs considerably from the expectation that regulation of replication in mammals generally would obey the same rules that apply to lower organisms. We therefore wanted to devise a method for isolating and mapping all of the active initiation sites in the genome in order to test our general model.
We devised a strategy for purifying restriction fragments that contained active initiation sites in vivo (Mesner et al. 2006) and have made recombinant libraries from a cancerous human cell variant (HeLa) as well as from the near-normal lymphoblastoid GM06990 cell line. The strategy involves the preferential entrapment of bubble-containing restriction fragments in gelling agarose. After purification and cloning of the entrapped fragments, the libraries were hybridized to microarrays from 1% of the human genome under study by the ENCODE consortium (Mesner et al., submitted). These studies show that half of the fragments in these libraries map at isolated positions in the genome and thus could contain single sites or small zones, while the other half cluster together in groups of two to 18 fragments and probably represent initiation zones (the average is ∼18 kb in length, with a range of 2 to 180 kb). Interestingly, the signal strengths of isolated fragments that could contain relatively fixed sites are well below the signals of the average fragment in a zone, suggesting that isolated sites are not generally more active than sites within zones. Origin distributions show clearly that intergenic sequences are not preferred over genic ones. This finding is consistent with studies on the DHFR promoter deletion showing that the gene itself is a perfectly good substrate for initiation when it is not transcribed (Saha et al. 2004) or when not transcribed at an ectopic position (Lin et al. 2005). All these findings therefore suggest to us that there is unlikely to be a strong selection for “replicators” in mammalian genomes. However, there is a significant association between bubble-containing fragments and neighboring active genes (Mesner et al., submitted), supporting the long-appreciated relationship between active transcription and early replication.
How do these data compare with origin maps that are being constructed by others using small nascent strands as the starting material? In a recent study that also focused on the ENCODE regions of the genome (Cadoret et al. 2008), five biological replicates of small nascent strands were prepared utilizing λ-exonuclease to attempt to enrich for RNA-primed nascent strands over small broken template DNA. Careful analysis of the enrichment of the c-myc origin over a nonorigin control showed that individual purities varied more than sixfold among these preparations. When hybridized individually to the microarrays, there was clearly a large amount of background hybridization from irrelevant DNA, but when a stringent cutoff was utilized, there were some very clear positive signals that were common to most of the preparations. In fact, a small number of these positions fell within EcoRI fragments that were recovered in the bubble-trapping protocol (Mesner, et al., submitted).
In general, however, there is very little overlap between the origin map generated with these nascent strand preparations and that generated by bubble-containing fragments that we have obtained. In part, this could be due to the high cutoffs used in the nascent strand studies to select only the most robust signals, which also renders it likely that the maps presented in this study are far from saturating. Presumably, these signals correspond to only the most active sites in the genome and would select against the identification of initiation zones that encompass many sites with different efficiencies of utilization. The bubble-trapping scheme is also biased toward larger fragments whose bubbles would have longer dwell times and is biased against fragments with very off-centered bubbles. Therefore, our libraries may not be saturating either. While the latter problem could be partially alleviated by utilizing different restriction enzymes to prepare additional libraries, the cost of this approach is currently prohibitive.
In fact, these possibilities may be the major source of discordance between the two approaches: if neither library is saturating, then the degree of overlap will be minimal even if the signals recovered for each type of library might be valid. However, it is probably not the only source of discordance, since in some of the ENCODE regions there appear to be dense concentrations of robust small nascent strand signals that do not coincide with similarly dense trapped bubbles, and in other cases the opposite is true, i.e., bubbles are found where there are very few small nascent strand signals. Overall, our current estimate is that fewer than 15% of the positive signals from the two preparations overlap (Mesner, et al., submitted). In our view, future studies by our own laboratory and by others should include some carefully crafted analyses of the preparations that are assumed to represent the majority of all start sites in the human genome before any serious take-home messages should be proffered about their overlaps with each other or with genes, CpG islands, etc. In that way, together, we will all be able to make an important concerted contribution toward understanding the nature of mammalian origins and how they are distributed within the genome.
We would like to thank all the members of the Hamlin laboratory past and present, who have toiled on this locus throughout the day and gone to sleep at night wondering how to perform the perfectly logical experiment. We also thank the NIH for continued support throughout the years (presently RO1-GM26108 and RO1-HG002937).