Main

Before cell division, the genome has to be faithfully replicated and DNA replication errors have to be averted to prevent developmental defects and tumorigenesis1,2,3. Genetics and biochemistry have revealed many DNA replication factors and their cooperation to ensure high-fidelity duplication of genomes. To probe DNA replicative processes in a genome-wide manner, DNA sequencing methods have been used to unravel DNA replication dynamics such as replication timing, replication fork directionality, origin and DNA polymerase usage4,5,6,7. Furthermore, single-molecule approaches have been widely used to monitor the behavior of individual replication forks, including replication speed, as well as fork stalling through detection by microscopy8,9,10 or long-read sequencing11,12,13. However, these methods randomly sample molecules from a large population of cells and are therefore insensitive to heterogeneity in replication dynamics between individual cells. We describe a method, single-cell 5-ethynyl-2′-deoxyuridine sequencing (scEdU-seq), that enables a high-resolution single-cell investigation of DNA replication fork dynamics.

Results

scEdU-seq reveals DNA replication profiles through S phase

To measure the heterogeneity of DNA replication fork dynamics, we developed scEdU-seq, a sequencing method to identify replicated nascent DNA in individual cells. scEdU-seq relies on metabolic labeling with the nucleotide analog 5-ethynyl-2′-deoxyuridine (EdU) and affinity capture of newly synthesized DNA fragments (Fig. 1a). We use copper(I)-catalyzed azide-alkyne cycloaddition click chemistry to covalently link a biotin moiety to the uracil base14. Following click, we sort single cells into a 384-well plate for single-cell processing. Subsequently, we digest the single-cell genome using a restriction enzyme (NlaIII) and end-repair fragments (large Klenow fragment and polynucleotide kinase), which are ligated to a T7 promoter containing adapters, cell-specific barcodes and a unique molecular identifier (UMI)15. After pooling cells, we biotin-capture the EdU-containing DNA molecules and release the non-EdU-modified strand by heat denaturation. Next, we regenerate the complementary strand via Klenow-mediated primer extension, followed by linear amplification via T7-dependent transcription. Finally, linearly amplified RNA is converted to complementary DNA by reverse transcription (RT) and amplified using a polymerase chain reaction (PCR) to prepare for Illumina sequencing.

Fig. 1: scEdU-seq reveals ordered DNA replication profiles throughout S phase.
figure 1

a, Representation of the scEdU-seq protocol. b, Z-scored genome-coverage tracks of log2(fold change) (early/late) S-phase samples for both scEdU-seq (500 cell bulk, upper) and Repli-Seq (400,000 cell bulk, lower) treated with 120 min of EdU. c, Dimensional distance between single cells by UMAP. Each dot is a single cell, lines indicate nearest neighbors, dots are colored by S-phase progression, DNA content (DAPI) or FUCCI markers. d, Scatter plot showing FUCCI reporters pseudo-colored by S-phase progression, determined using scEdU-seq tracks from a 15-min EdU pulse of cycling RPE-1 cells overlaid on the cell-cycle distribution of control RPE-1 cells (gray). e, Rolling mean of the z-scored fluorescence intensity of FUCCI reporters and DNA content (DAPI, y axis) versus S-phase progression (x axis) based on scEdU-seq tracks (single EdU pulse (15 min)). The ribbon indicates the standard deviation. f, Heatmap of scEdU-seq from single EdU pulse (15 min) maximum normalized log counts for 1,343 RPE-1 hTERT FUCCI cells ordered according to S-phase progression (y axis) and binned per 400 kb bins (x axis) for a 50 megabase region of chromosome 2. Heatmap showing log2(fold ratio) of early to late Repli-Seq indicating replication timing (upper) and a bar graph showing the replication origins (EdUseqHU19) of the same stretch of chromosome 2 (lower). Scaled z-scored intensities of FUCCI reporters and DNA content (DAPI) ordered by S-phase progression are shown on the right. g, Number of forks (y axis) per cell versus S-phase progression (x axis). The line represents the rolling-window median and the ribbon indicates the 95% confidence interval of the windows. h, Subsampling of unique reads per cell (x axis) versus detected number of DNA replication forks per single cell (y axis) for 15-min EdU-treated RPE-1 hTERT cells.

We compare scEdU-seq with Repli-Seq7 in human RPE-1 cells expressing human telomerase reverse transcriptase (hTERT) and find substantial overlap of the DNA replication profiles (Fig. 1b and Extended Data Fig. 1a–d). A feature of scEdU-seq is profiling non-nascent DNA in conjunction with nascent DNA (Extended Data Fig. 1e,f). As expected, we observe an anticorrelation between nascent and non-nascent DNA from the same sample. Next, we set out to generate DNA replication profiles in single cells, using fluorescence-activated cell sorting (FACS) enrichment for S-phase cells (based on 4′,6-diamidino-2-phenylindole (DAPI)) labeled with 15 min of EdU (Extended Data Fig. 1g). We observe that our sorting gates include all S-phase cells, because we detect no scEdU-seq+ cells at the extremities of the DAPI gates (Extended Data Fig. 1h,i). We reconstruct progression through S phase by ordering cells based on the overlapping scEdU-seq signal between single cells. Ordering of single cells by S-phase progression relies on the assumption that cells adhere to a similar replication timing, which has previously been shown for RPE-1 cells16. We sought to find an overall S-phase progression position for all sampled cells. First, we compare cells in a pairwise manner using the overlap coefficient, giving us a pairwise similarity score (Extended Data Fig. 1j and Methods). These pairwise scores are converted to distances and reduced using a one-dimensional uniform manifold approximation and projection (UMAP) (Fig. 1c). The average bootstrapped order of the cells on this UMAP line is the S-phase progression position. To validate the inferred S-phase progression, we perform scEdU-seq in cells expressing the fluorescent ubiquitination-based cell cycle indicator (FUCCI) reporter system, a fluorescent read-out reflecting cell-cycle stage, allowing us to record the S-phase position of each sequenced single cell. We observe that scEdU-seq based S-phase progression accurately reflects the position in the cell cycle shown by FUCCI17 reporters and DNA content (Fig. 1d,e and Extended Data Fig. 1k). This shows that we can use scEdU-seq based S-phase progression to order cells throughout DNA replication. Implementing single-cell S-phase progression ordering, we construct DNA replication tracks over the entirety of S phase from an ensemble of 1,343 RPE-1 hTERT FUCCI cells (Fig. 1f; genome-wide visualization provided at https://sceduseq.eu/). scEdU-seq based S-phase ordering is consistent with published bulk DNA replication timing using cell-cycle sorted populations (Pearson ρ = 0.8, Extended Data Fig. 1l,m)18. In addition, we observe that the start of scEdU-seq tracks (early S-phase progression) overlaps with genome positions that were identified by 5-ethynyl-2′-deoxyuridine sequencing with hydroxyurea (EdUseq-HU) as initiation zones of DNA replication in RPE-1 cells (Fig. 1f)19. As expected, we observe the greatest overlap between EdUseq-HU and scEdU-seq in the initial 5% of S-phase progression (Pearson ρ > 0.65; Extended Data Fig. 1n).

Regulation of the number of DNA replication forks is essential to limit replication stress and genomic instability19,20,21. Experimental quantification of the number of DNA replication forks is challenging. Initial efforts with imaging techniques8 resulted in an estimation of 5,000 DNA replication forks per human cell22. Quantification of individual DNA replication forks from single cells requires segmentation of the scEdU-seq signal into blocks of DNA replication tracks. To achieve these DNA replication fork calls, we fit a hidden Markov model (HMM) per cell to identify stretches that were generated by the same replisome (Methods and Extended Data Fig. 2a–f). We observe that RPE-1 cells have ~4,500 forks per cell (Fig. 1g), consistent with imaging studies22. Moreover, the number of DNA replication forks per chromosome correlates with the length of each human chromosome (Extended Data Fig. 2g). As expected, chromosomes 10 and 12 display an elevated number of DNA replication forks because these chromosomes are (partially) amplified in RPE-1 cells (chr10q and full chr12). The DNA replication fork number between single chromosomes follows the same trend over S phase suggesting similar regulation of DNA replication per chromosome (Extended Data Fig. 2h).

In addition to quantification of the number of forks, we use these analyses to compute the sensitivity of scEdU-seq. To assess sensitivity, we quantify the number of detected replication forks per cell following downsampling of unique reads (Fig. 1h and Extended Data Fig. 2i). We find that downsampling reads from cells results in a decrease in the number of detected forks; however, this decrease only becomes apparent after removing 25–30% of unique reads from single cells. We observe that for the majority of cells, we still detect the vast majority of DNA replication forks after removing 10,000 unique reads (Extended Data Fig. 2i), which is in line with the number of detected reads per replication fork (Extended Data Fig. 2f). To further explore scEdU-seq sensitivity, we attempt to compare the ability of scEdU-seq and single-cell Repli-Seq (scRepli-Seq) to detect DNA replication forks. Using a similar analysis, we observe twice as many DNA replication forks genome-wide in scEdU-seq in similarly staged cells during S phase (Extended Data Fig. 3a,b). For a representative genomic locus, we find that certain regions contain DNA replication forks in scEdU-seq that are not or less frequently detected in scRepli-Seq (Extended Data Fig. 3c,d). Taking these results together, scEdU-seq allows high-sensitivity profiling of DNA replication forks in single cells.

Double-pulse scEdU-seq allows DNA replication speed estimate

We have shown that scEdU-seq is able to detect the majority of DNA replication forks with high resolution. Next, we set out to determine replication speeds in single cells using scEdU-seq. Although replication speeds can be estimated from replication track widths, these depend on the sequencing depth (Extended Data Fig. 4a). By contrast, a double-pulse EdU-labeling strategy leads to increased accuracy of replication speeds in single cells and is less sensitive to the unique reads recovered per cell (Extended Data Fig. 4a–c).

After receiving two EdU pulses, the genome of a single cell in S phase is decorated with patches of EdU that are separated by distance Δx. The average replication speed is approximated by dividing this distance, Δx, by the EdU pulse center-to-center timespan Δt (labeling times). To systematically analyze these data, we use the pair correlation function23, which is defined as the distribution of all pairwise distances between EdU-containing reads in a single cell (Fig. 2a, right). For two EdU pulses, we expect this distribution to contain three main features that reflect different structures in the data. First, pairs of reads that were labeled within one pulse would generate a distribution of short distances (Fig. 2a, intrapulse distances in yellow). Next, read pairs from the same replication fork labeled in separate pulses would generate a distribution around distances at Δx=\(\bar{\Delta x}\) (Fig. 2a, interpulse distances in cyan). Finally, pairs of reads labeled in either pulse from different replication forks would yield a uniform background (Fig. 2a, interfork distances in pink). From the location of the interpulse distance distribution, we can estimate the average DNA replication fork progression in a single cell. We compute the average DNA replication fork speed for each cell by dividing the average distance traveled \(\,\bar{\Delta x}\) by the time between pulses Δt. Experimental pair correlations consistently behaved as expected. A single cell exposed to a single EdU pulse shows one maximum at Δx = 0 (Fig. 2b and Extended Data Fig. 4d,e). Conversely, cells exposed to a double EdU pulse (Δt = 45, 75 or 105 min) also show a second maximum. As expected, this second maximum shifts to larger values of Δx as Δt is increased (Fig. 2b, upper and Extended Data Fig. 4f).

Fig. 2: Double-pulse scEdU-seq allows DNA replication speed assessment.
figure 2

a, Schematic representation of the double-pulse labeling scheme and subsequent analysis. b, Line plots of the pair correlations of the single-pulse and double-pulse labeling scheme with Δt = 45 min (n = 376), Δt = 75 min (n = 347) and Δt = 105 min (n = 149). Each line is a single RPE-1 cell, where the x axis shows the binned distance ∆x (kb) and the y axis is the range-scaled density. A blue line indicates the mean density per bin. c, Pairwise distance between all reads from a Δt = 75 min scEdU-seq experiment of a single cell (green) and in silico bulk (100 cell bulk, blue). The thick line represents raw data and the thin line represents a LOESS regression of the raw data. d, Normalized count (y axis) of binned pairwise distances (x axis) between HMM-segmented pulses. Each line is a single cell and the magenta line indicates the median normalized count. Black and red lines indicate the median and mean size of a segmented fork in a 15 min EdU single-pulse scEdU-seq experiment. The yellow line indicates the median distance for the speed component in Δt = 75 min scEdU-seq data. e, DNA replication distance estimates (\(\bar{\Delta x}\,\)) in kb for cells (Δt = 45 min (n = 57), Δt = 75 min (n = 299) and Δt = 105 min (n = 103)) from b colored by labeling scheme. Upper, Ticks indicate the averages per labeling scheme. Lower, Distance estimates corrected for labeling resulting in DNA replication speeds (kb min−1). f, DNA replication speed estimates (\(\bar{\Delta x}\)) per chromosome (Δt = 45 min (n = 57), Δt = 75 min (n = 299) and Δt = 105 min (n = 103)) from b colored by labeling scheme. Upper, Ticks indicate the average per labeling scheme. Lower, Distance estimates corrected for labeling scheme resulting in DNA replication speeds (kb per min per chromosome).

We find that using single cells is critical; adding together pairwise distances from different cells drastically alters the signal in the pair correlation, creates noise and hinders the speed measurement (Fig. 2c; 1 cell versus 100 cells). This occurs even at low cell numbers (5–100 cells; Extended Data Fig. 4g). In addition, downsampling unique reads from a single cell does not hamper detection of the second maxima in the pair correlation (Extended Data Fig. 4g). This implies that many DNA replication forks contribute to the second maxima. A fraction of the pairwise distances between read pairs originates from two different forks, which potentially confounds our analyses (Fig. 2a; distances indicated in pink). Conceptually, we expect these effects to result in a background signal of the pair correlation function. To confirm this, we use our single-pulse data and recover a near-uniform distance distribution between forks per cell (Fig. 2d). This shows that a signal from separate forks does not interfere with detection of the second maximum (\(\bar{\Delta x}\)). In line with this observation, we do not detect a second maximum in the pair correlation function for a single pulse (Extended Data Fig. 4h,i).

To quantify DNA replication speeds from these data, we analyze the position of the second maximum (\(\,\bar{\Delta x}\,\)) of single cells exposed to a double EdU pulse (Methods and Fig. 2e, upper). As expected, this second maximum increases \(\,\bar{\Delta x}\) as a function of Δtt = 45, 75 or 105 min) and appears to increase linearly along Δx with the increase in Δt (Fig. 2e and Extended Data Fig. 4f). Indeed, when \(\,\bar{\Delta x}\) is divided by Δt we find similar DNA replication speeds for all labeling strategies (Fig. 2e, lower). This shows that the increase in \(\,\bar{\Delta x}\) as a function of Δt is caused by progressing DNA replication forks. Pairwise distances per chromosome show similar distributions of \(\bar{\Delta x}\) compared with all chromosomes combined (Fig. 2f).

The overall distribution of distances between reads is the result of multiple sources (Fig. 2a). To quantitatively model the individual sources, we use a mixture model. This allows estimation of the parameters of individual sources without the requirement for a priori assignment of each distance to one of the sources. We use a normal distribution to fit the interpulse distance (speed) component. Subsequently, we can obtain corresponding confidence intervals by bootstrapping with an expectation maximization algorithm and fit the model (Extended Data Fig. 5a–c)24. Finally, we observe a slight bias in DNA replication speeds as the timespan between labeling pulses increases (Fig. 2e,f). Simulations of double-pulse data show that higher speeds result in longer replication tracks, which have a greater weight and therefore contribute more to the resulting pair correlation (Extended Data Fig. 5d–g). We correct for this effect using the simulated data (Extended Data Fig. 5h). Taken together, we can use double-pulse EdU labeling combined with pair correlation analysis to identify DNA replication speeds in single cells.

Transcription limits DNA replication speeds in early S phase

We can measure DNA replication speeds in single cells by using a double EdU pulse in combination with a mixture model. Representative cells labeled with a double EdU pulse (Δt = 75 min) demonstrate that the mixture model (Fig. 3a, red line) accurately describes the experimental pair correlation (black line). Overall, we find DNA replication speeds in the expected range described in literature25. Unexpectedly, we observe great variability in replication speeds between individual cells (~1.5-fold difference, Fig. 2c,d). A large part of this variability is explained by the position of single cells in S phase. We observe a steady increase in DNA replication speeds suggesting acceleration of replication throughout S phase (Fig. 3b).

Fig. 3: Transcription limits DNA replication speeds in early S phase.
figure 3

a, Binned histogram of distances (kb, x axis) and range-scaled density (y axis) single-cell pair correlations (black line) with a fitted model (dashed red line) of representative early, middle and late S-phase (lower) RPE-1 cells labeled with the Δt = 75 min scheme. b, DNA replication speed (x axis) over S phase (y axis) in RPE-1 (n = 326) treated with DMSO subjected to the Δt = 75 min labeling scheme. Each dot is a cell, the line indicates a rolling-window median smooth and the ribbon the standard deviation. c, Sorted early (yellow) and sorted late (green) S-phase RPE-1 hTERT FUCCI cells superimposed on all detected single-cell events during FACS. d, Detected DNA fiber length analysis using DNA combing analysis from the indicated cell population labeled with IdU (20 min labeling, 100 fibers per replicate, n = 3, two-sided Student’s t-test, P < 2.99 × 10−23). The boxplot is defined by the median ± interquartile range (IQR) and whiskers are 1.5× IQR. e, Fraction of scEdU-seq domains covered by expressed genes (y-axis) over S-phase (x-axis) colored by expression level (stacked). f, Nascent RNA-sequencing from S-phase RPE-1 cells. Rolling-window smoothened normalized scEdU-seq coverage of genes (y-axis) over S-phase progression (x-axis) colored by expression level. g, Maximum normalized counts (y axis) of DNA replication speeds (x axis) inside or outside the indicated regions. Adjusted P values with two-sided t-test are: H3K27me3 (0.512, nonsignificant (NS)), H3K36me3 (3.0 × 10−3, ***), H3k9me3 (3.9 × 10−4, ****) and transcribed (1.2 × 10−4, ****). h, DNA replication speed over S phase in RPE-1 treated with DMSO (gray, n = 326) or DRB (cyan, n = 713) subjected to Δt = 75 min labeling scheme (lower left), the line and ribbon indicate the rolling-window median standard deviation. Difference in DNA replication speeds between DMSO and DRB in kb min−1 (y axis) over S-phase progression (x axis, upper left), marginal density (x axis) of DNA replication speed in kb min−1 (y axis) colored for DMSO-treated (gray) or DRB-treated cells (cyan, lower right) and cumulative distribution speeds (upper right). i, Maximum normalized counts of speeds inside or outside the indicated regions for DMSO-treated or DRB-treated RPE-1. Adjusted P values with two-sided t-test are: DMSO-H3K36me3 (3.0 × 10−3, ***), DMSO-transcribed (1.2 × 10−4, ****), DRB-H3K36me3 (1.1 × 10−2, **) and DRB-transcribed (0.398, NS). RFS, replication fork speed.

We can also estimate replication speeds with single-pulse labeling by quantifying the width of DNA replication tracks (Methods). We find similar DNA replication speeds and acceleration of replication throughout S phase in single-pulse data (Methods and Extended Data Fig. 6a–c). In addition, we confirm these phenotypes by DNA fiber analysis on sorted early and late S-phase RPE-1 hTERT cells (Fig. 3c,d and Extended Data Fig. 6d). Finally, we profile human-induced pluripotent stem cells and observe a similar increase in replication speeds during S phase (Extended Data Fig. 6e–g). Moreover, a recent study using long-read sequencing describes similar DNA replication speeds as well as an increase in DNA replication speeds throughout S phase26. These independent observations support the validity of the double-pulse EdU experiments.

Variability in DNA replication rates has been observed since the 1970s (refs. 27,28,29). In human cells, the acceleration of replication speeds throughout S phase has not been previously observed and the mechanism behind the reduced DNA replication speeds in early S phase remains elusive. Previous studies have shown that early replicating DNA is close to actively transcribed regions30. Because we observed the lowest replication speeds in early S phase, we hypothesize that lower speeds might be caused by transcription. To quantify transcription levels across S phase, we use single-cell nascent RNA sequencing data on RPE-1 cells (Extended Data Fig. 7a–d)31. As expected, we observe the highest levels of transcription in regions of the genome that overlap with the scEdU-seq signal at the start of S phase (Fig. 3e,f). In addition, both the number of transcribed regions and transcription levels decrease as DNA replication progresses over S phase. The presence of high levels of transcription in early S phase correlates with lower DNA replication speeds (Fig. 3e,f).

Indeed, transcribed regions in early S phase are replicated even more slowly than nontranscribed regions (Fig. 3e), which implies that transcription limits DNA replication speeds in early S phase. Nonetheless, we still observe the acceleration of DNA replication speeds over the course of S-phase progression outside transcribed regions (Extended Data Fig. 7e). This suggests that other factors control DNA replication speeds. To address this, we profile RPE-1 cells using chromatin immunocleavage sequencing15 for H3K36me3, H3K27me3 and H3K9me3 modifications, which align with the expected patterns of DNA replication timing32 (Extended Data Fig. 7f,g). We observe that both H3K36me3 and transcribed regions display lower DNA replication speeds (Fig. 3g), confirming our initial observation. Conversely, we find that H3K9me3-repressed chromatin confers higher DNA replication speeds. In addition, we observe that within H3K27me3 chromatin, DNA replication speeds do not drastically differ.

If active RNA polymerase II (RNAPII) transcription reduces DNA replication speeds, we would expect an increase in DNA replication speeds in early S phase by inhibiting transcription. To assess DNA replication speeds without active RNAPII transcription, we treat cells with the transcription inhibitor 5,6-dichlorobenzimidazole-1-β-d-ribofuranoside (DRB, 60 min) (Extended Data Fig. 8a–e) between the two EdU pulses, which does not alter either initiation zones or replication timing (Extended Data Fig. 8f,g). We observe an overall increase in DNA replication speeds in RNAPII-inhibited cells versus dimethylsulfoxide (DMSO)-treated cells (Fig. 3h, bottom). Furthermore, this difference in speed results from increased DNA replication speeds during early S phase (Fig. 3h, upper). Finally, we analyze DNA replication speeds in DRB-treated RPE-1 cells for transcribed regions as well as H3K36me3 chromatin (Fig. 3i). We find that we can increase DNA replication speeds in transcribed as well as H3K36me3 chromatin using the RNAPII inhibitor DRB. We find that the RNAPII inhibitor almost completely removes the influence of transcription and H3K36me3 on DNA replication speeds. In summary, DNA replication accelerates over S phase, in part as a result of RNAPII transcription decreasing replication speeds during early S phase.

Transcription-coupled damage decreases DNA replication speed

RNAPII activity has been correlated with a variety of types of DNA damage; for example, by generating single-strand breaks through topoisomerase I cleavage complexes or repair of bulky adducts by transcription-coupled nucleotide excision repair33,34. Moreover, conflicts between the DNA replication fork and transcription machinery lead to the formation of RNA:DNA hybrids, which result in double-strand breaks if improperly handled35. Indeed, short inhibition of RNAPII (1 h) during S phase results in a reduction in DNA damage as assayed by flow cytometry (Fig. 4a; γH2AX). This suggests that RNAPII activity, at least in part, causes transcription-coupled DNA damage during S phase.

Fig. 4: Transcription-coupled damage decreases DNA replication speeds.
figure 4

a, Z-scored log10-transformed ADP-ribose (left) or γH2AX (right) intensities for DMSO-treated (gray) or DRB-treated cells (cyan) and bootstrapped mean of intensities (lower). b, DNA replication speed over S phase in RPE-1 treated with DMSO (gray, n = 326) or PARPi (24 h, green, n = 766) (lower left) showing the difference in speeds between DMSO and PARP in kb min−1 (y axis) over S phase (x axis, upper left), density (x axis) of DNA replication speeds in kb min−1 (y axis) colored for DMSO- treated (gray) or PARP-treated cells (green) and cumulative frequency distribution of speeds (upper right). c, Cartoon of signaling and repair of single-strand breaks. d,e, Z-scored log10-transformed ADP-ribose (d) or γH2AX (e) intensities (x axis) for WT (yellow) or XRCC1Δ RPE-1 cells (orange) and bootstrapped mean of intensities (lower). A two-sided Student’s t-test with multiple testing correction (Bonferroni) was performed (n = 3; pan adenosine diphosphate ribose (pADPr), P < 2.81 × 10−8; γH2AX, P < 3.05 × 10−15). f, Percentage of EdU+ cells (y axis) for DMSO versus PARPi (24 h) (x axis) colored by WT (yellow) or XRCC1Δ (red) (n = 3; two-sided Student’s t-test; ***P = 5.82 × 10−4), boxplots are defined by the median ± IQR and whiskers are 1.5× IQR. g, DNA replication speed over S phase in WT (yellow, n = 326) or XRCC1Δ (red, n = 187) (lower left). Each dot is a cell, the line indicates a rolling-window median smooth and the ribbon the standard deviation. Difference in DNA replication speeds between WT and XRCC1Δ in kb min−1 (y axis) over S phase (x axis, upper left). The histogram shows DNA replication speeds in kb min−1 (y axis) colored for WT (yellow) or XRCC1Δ (red, lower right) and cumulative frequency distribution (upper right). h, DNA replication speed over S phase in XRCC1Δ treated with DMSO (red, n = 187) or PARP (4 h, green, n = 393) (lower left). Each dot is a cell, the line indicates a rolling-window median smooth and the ribbon the standard deviation. Difference in DNA replication speeds between DMSO and PARPi (4 h) in kb min−1 (y axis) over S phase (x axis, upper left), marginal density (x axis) of DNA replication speeds in kb min−1 (y axis) colored for DMSO-treated (red) or PARP-treated cells (green, lower right) and cumulative frequency distribution of speeds (upper right).

The activity of the DNA damage sensor poly(ADP-ribose) polymerase 1 (PARP-1) is stimulated by a wide variety of DNA damage lesions36. Because PARP-1 activity has previously been linked to DNA replication speeds25, we reason that the decrease in DNA replication speed in early S phase might be caused by transcription-coupled DNA damage and subsequent PARP activation. We observe a decrease in the level of pan ADP-ribose, the modification deposited by PARP enzymes, upon PARP inhibition (Fig. 4a, pan ADP-ribose). This suggests that RNAPII transcription during S phase not only induces DNA damage, but also activates PARP.

To explore how transcription-coupled DNA damage might affect DNA replication speed in single cells, we make use of the PARP inhibitor Olaparib. First, we treat wild-type (WT) RPE-1 cells with a PARP inhibitor (PARPi) and observe very similar DNA replication speed behavior compared with RNAPII inhibition (Fig. 4b and Extended Data Fig. 9a–e). Overall DNA replication speeds are higher in PARP-inhibited cells without altering either initiation zones or replication timing (Extended Data Fig. 9f). In addition, the most notable difference in DNA replication speeds occurs in early S phase suggesting a connection to RNAPII transcription. To validate these findings, we use the single-pulse strategy and fork-width analysis and find higher DNA replication speeds, specifically in early S phase, in PARPi treatment compared with DMSO treatment (Extended Data Fig. 9g,h).

To further address the role of PARP activity in regulating DNA replication speeds, we hyperactivate PARP-1 by generating an RPE-1 cell line in which the gene XRCC1 was knocked out (XRCC1Δ RPE-1) (Extended Data Fig. 9i,j). XRCC1 protein is required for efficient repair of DNA damage. In the absence of this protein, an increase in steady-state levels of DNA damage causes PARP hyperactivation, which eventually leads to cerebral ataxia37 (Fig. 4c–e). XRCC1Δ cells have a lower proportion of EdU+ cells compared with WT cells, which is partially mitigated by PARPi treatment (Fig. 4f). This implies that excessive PARP signaling in XRCC1Δ cells result in lower DNA replication speeds. In line with this observation, XRCC1Δ cells display overall lower DNA replication speeds compared with WT cells (Fig. 4g and Extended Data Fig. 9k,l). In contrast to RNAPII or PARP inhibition, we observe a ubiquitous decrease in DNA replication speeds in all cells, not just early S-phase cells. This suggests that hyperactivation of PARP, outside transcribed regions, results in lower DNA replication speeds. In addition, we can rescue the global decrease in DNA replication speeds in XRCC1Δ RPE-1 by PARPi treatment (Fig. 4h).

In addition to these changes in DNA replication speeds, we also observe that the variability in speed is altered within an individual cell. We find that inhibiting transcription has the largest overall effect of reducing variability in DNA replication speeds within a single cell (Extended Data Fig. 10a–d). In line with this, we observe that the variability in DNA replication speeds is higher in early S phase (WT DMSO) and is dramatically decreased upon addition of the inhibitor of RNA polymerase II DRB (Extended Data Fig. 10a–d). Moreover, elevated DNA damage (XRCC1Δ DMSO) increases the variability in DNA replication speeds compared with the steady-state (WT DMSO). The variability of replication speeds in both WT and XRCC1Δ cells are decreased following addition of the PARP inhibitor Olaparib (Extended Data Fig. 10a–d). In WT cells, this decrease seems to be concentrated in early S phase (Extended Data Fig. 10e). Conversely, XRCC1Δ cells display higher levels of variability throughout S phase, which are diminished throughout S phase with PARPi (Extended Data Fig. 10e). This indicates that it is not only transcription that impacts DNA replication speeds and variability within single cells. Nonetheless, these results imply that transcription-coupled damage increases the variability in speed between DNA replication forks in early S-phase cells. Furthermore, our findings suggest that PARP activity is critical in regulating DNA replication speeds in response to transcription-coupled DNA damage.

Discussion

We developed a method to profile DNA replication forks and their speeds in single cells. We observe that DNA replication speeds accelerate during S phase. Reduced DNA replication speeds at the start of S phase occur in genomic regions with high levels of RNAPII transcription. Inhibition of RNAPII transcription increases DNA replication speeds at these locations. We find that inhibition of RNAPII results in both lower PARP activity and less DNA damage. We continue to show that lowering PARP activity allows for higher DNA replication speeds, specifically in early S phase. In addition, the hyperactivation of PARP in RPE-1 cells lacking XRCC1 results in a genome-wide decrease in DNA replication speeds. We can reverse this decrease by lowering PARP activity, indicating a direct role for PARPs in regulating DNA replication speeds. Overall, this implies that transcription-coupled DNA damage increases PARP activity, which in turn reduces DNA replication speed.

Our data suggest crosstalk between DNA replication fork speeds and transcription through the activity of PARP enzymes. Proteomic profiling has identified DNA replication and transcription as the two biological processes regulated through ADP-ribosylation by PARP enzymes38. Several replication factors are known to regulate fork speeds25,39,40. However, which specific components are regulated by PARPs to reduce replication fork speeds remains to be discovered. Outside transcribed regions, we have identified several other factors contributing to DNA replication speeds. Previous studies have implied that chromatin states control replication speeds by studying the inactive X-chromosome in hybrid mice41. These observations and previous findings provide an interesting avenue for future studies

There are several advantages to using the nucleotide analog EdU. scEdU-seq does not require internal normalization to sorted G1 cells. Therefore, we can extract DNA replication profiles from heterogeneous samples from highly divergent sources bearing different karyotypes, as evidenced by chromosomal gains 10q and 12 in RPE-1 hTERT cells (Extended Data Fig. 2g,h). This feature of scEdU-seq enables DNA replication research at the single-cell level in non-copy number variable material, which is of particular interest in samples with replication stress mutational signatures42. Alternatively, unscheduled DNA replication by a variety of DNA repair pathways could potentially be detected using scEdU-seq. Furthermore, reagent costs for scEdU-seq are lower than for methods based on whole-genome sequencing because only the nascent DNA is sequenced. In combination with smaller reaction volumes and a lack of reliance on commercial kits, this results in increased scalability. The scalability of scEdU-seq enables small-scale chemical/genetic screens to identify regulators of DNA replication. These screens might prove useful in uncovering the mechanisms of action of Olaparib (PARPi) on DNA replication speeds in transcribed regions of the genome. Finally, extracting transcriptome profiles in conjunction with DNA replication profiles will allow the uncovering of molecular crosstalk as well as the identification and characterization of rare DNA replication events.

In addition to using scEdU-seq to profile DNA replication speeds, long-read sequencing has been used to determine replication dynamics11,12,13,26. In terms of cost, scEdU-seq and nanopore-based long-read sequencing are comparable. Long-read sequencing has single base pair resolution, whereas scEdU-seq is limited by the distance between restriction enzyme digest sites in the genome (~250 bp for NlaIII). Therefore, scEdU-seq cannot quantify replication speed in small regions of the genome such as promoters, enhancers and R-loops (~0.2–2 kb in length)35,43. Nanopore-based methods are better suited for this; however, this technology is not yet applicable at the single-cell level. In addition, determining replication timing (S-phase progression) based solely on long-read sequencing is challenging. Moreover, only a fraction of the long reads are labeled with nucleotide analogs, which results in ~99% of reads being unlabeled26. Therefore, which method is better depends on the particular hypothesis. We feel that scEdU-seq and nanopore-based technology are highly complementary.

Another complementary technology, scRepli-Seq, enables the quantification of replication timing in single cells. scRepli-Seq can be performed on fixed materials because it does not require the incorporation of a synthetic uridine analog. In addition, scRepli-Seq can be performed at a higher throughput using droplet-based methods, easily profiling tenfold more cells compared with scEdU-seq. Finally, the availability of commercial kits for single-cell DNA sequencing enables increased accessibility of scRepli-Seq to the community44,45. Conversely, we feel that scEdU-seq has advantages over scRepli-Seq. As previously stated, there is no need for copy number correction in scEdU-seq. Furthermore, scEdU-seq reagent costs are lower owing to the smaller custom reactions per cell and because fewer sequencing reads are required per cell, resulting in roughly 50-fold lower costs for scEdU-seq compared with scRepli-Seq (Supplementary Table 1).

A potential limitation of scEdU-seq (and related techniques based on metabolic labeling of DNA) is that it relies on the incorporation of nonnatural nucleotides, which might induce a stress response or affect DNA repair46,47,48. Searching for endogenous read-outs for active DNA replication in single cells is important to circumvent such a potential limitation. In addition, scEdU-seq cannot be applied to bio-banked material and other situations in which nucleotide analog labeling is not feasible. Moreover, cells with low EdU incorporation rates (for example, cells at the very beginning of S phase) might be excluded during the quality control step, which could possibly be mitigated by longer labeling or increased enrichment by FACS. Of note for the double-pulse EdU-labeling experiments, we are not able to measure the speed of all active forks in the cell. For instance, when a fork is labeled during the first pulse and is annihilated by another fork or when an initiation site fires before the second EdU pulse, we are unable to detect DNA replication speeds. We hope that scEdU-seq will enable the identification of single-cell DNA replication dynamics as well as replication speeds in a wide range of biological systems.

Methods

Cell lines and reagents

RPE-1 hTERT FUCCI, RPE-1 hTERT and RPE-1 hTERT XRCC1Δ cells were cultured in DMEM/F12 supplemented with 10% FBS (Gibco), 1× GlutaMAX (Gibco) and 1× Pen-Strep (Gibco) at 37 °C with 5% CO2. Human-induced pluripotent stem cells were cultured on vitronectin-coated plates in Essential E8 medium supplemented with penicillin/streptomycin and Revitacell (first 24 h, ThermoFisher Scientific). RPE-1 cells routinely tested negative for Mycoplasma contamination and were not authenticated. Cell counting was performed with the Bio-Rad TC-20 Cell Counter. The following chemicals were used: 5-ethynyluridine (EU) (100 μM, Invitrogen), EdU (10 μM, Invitrogen), Olaparib (AZD2281, 10 μM, Cell Signaling Technology), DRB (10 μM, Sigma), DAPI (2 μg ml−1, ThermoFisher Scientific) and SN-38 (at the indicated concentrations; SelleckChem)

XRCC1Δ knockout generation in RPE-1 hTERT cells

Guide RNA was designed using CRISPOR against the second exon of XRCC1. The primers 5′-CACCGAGACACTTACCGAAAATGGC-3′ and 5′-AAACGCCATTTTCGGTAAGTGTCTC-3′ (Integrated DNA Technologies) were cloned into pX330 (ref. 49). Cells were co-transfected with pDonor-Blast50 and pX330-XRCC1, allowed to recover for 72 h and selected with blasticidin. Clones were picked and expanded. The picked clones were validated by western blot analysis as previously described51 and probed with CDK4 (1:1,000; Santa-Cruz Biotechnologies, cat. no. sc-260) and XRCC1 (1:1,000; Abcam, cat. no. ab1838) primary antibodies. Secondary goat-anti-rabbit and goat-anti-mouse horseradish peroxidase-conjugated antibodies were used for detection on Bio-Rad Gel-Doc (1:2,000; DAKO).

SN-38 proliferation assay

Some 500 RPE-1 hTERT or RPE-1 hTERT XRCC1Δ cells were plated in a 96-well plate and treated with increasing concentrations of SN-38 for 120 h. Cell viability was measured at the end of the experiment with a CellTiter-Glo cell viability assay according to the manufacturer’s protocol.

Single- and double-pulse EdU treatment

For single-pulse experiments, 1.5 × 106 cells were treated with 10 μM EdU for 15 min, trypsinized and washed with 1× PBS buffer, followed by fixation in 75% ice-cold ethanol. Double-pulse experiments were performed by treating 1.5 × 106 cells with 10 μM EdU for 15 min. Cells were subsequently washed three times with DMEM/F12 medium and allowed to recover for the indicated periods (for example, 30, 60 or 90 min). Finally, cells were treated with a second pulse of EdU (10 μM), trypsinized, washed in PBS and fixed in 4 ml of 75% ethanol.

Fixed cells were stored at −20 °C for up to 24 h. For longer storage periods of up to 3 months, cells were stored in 4 ml of storage buffer (150 mM NaCl, 20 mM HEPES, 2 mM EDTA, 25 mM Spermidine with 10% DMSO) at −20 °C.

Azide–PEG3–biotin EdU-click reaction

Eppendorf Protein Lo-Bind 0.5-ml tubes were precoated with 0.25% BSA in PBS. Afterwards 500 μl of cells, in either 75% ethanol or storage buffer, were pelleted for 3 min at 600g. The cells were resuspended in 0.25% BSA in PBS and left to block for 30 min at 4 °C. Following blocking, cells were pelleted and the click reaction was performed in situ in 50-µl reactions using the EdU-Click 647 imaging kit (Invitrogen) according to the manufacturer’s protocol with some alterations. Azide-647 was replaced with azide–PEG3–biotin conjugate (Sigma, 2 mM) and supplemented with 6 mM tris((1-hydroxy-propyl-1H-1,2,3-triazol-4-yl)methyl)amine (Jena Bioscience).

FACS

Following the click reaction, RPE-1 cells were washed once in 1× PBS, resuspended in PBS with 0.25% BSA (ThermoFisher Scientific) and 10 µg ml−1 DAPI, and passed through a 20-µm mesh. Single cells were index sorted into a 384-well plate using BD FACS Influx with the following settings: sort objective single cells, a drop envelope of 1.0 drop, a phase mask of 10/16, a maximum of 16 extra coincidence bits, a drop frequency of 38 kHz, a 100-µm nozzle with a pressure of 18 pounds per square inch and a flow rate of ~100 events per s, which results in a minimum sorting time of ~5 min per plate.

Doublets and debris were excluded by using the forward and side scatter and the DAPI channel. For the hTERT RPE-1 FUCCI cells, measurements in the DAPI channel were used to enrich S-phase cells. The intensities in the monomeric Azami-Green and monomeric Kusabira-Orange 2 as well as DAPI channels were acquired and later used for data analysis. Single cells were sorted into 384-well hardshell plates (Bio-Rad) containing 5 µl of light mineral oil (Sigma-Aldrich).

Library preparation

Library construction progressed through three general steps (Fig. 1a). Reagents were dispensed to 384-microwell plates using either Nanodrop II (Innovadyne Technologies) or Mosquito (TTP Labtech). Plates were spun at 2,000g for 2 min after each liquid transfer step.

Cell lysis and NlaIII digestion

After sorting, single cells were lysed in 100 nl of lysis mix (10 nl of 1× CutSmart buffer (NEB), 10 nl of proteinase K (Ambion), 80 nl of H2O). Plates were incubated for 2 h at 55 °C and the proteinase K was heat-inactivated for 20 min at 80 °C. The genome was digested with 100 nl of NlaIII mix (10 nl of 1× CutSmart buffer (NEB), 10 nl of NlaIII (Ambion), 80 nl of nuclease-free H2O) at 37 °C for 4 h and heat-inactivated for 30 min at 65 °C.

End-repair and A-tailing followed by adapter ligation

To end-repair NlaIII overhang, we next incubated single cells with 100 nl of end-repair mix (1.6 nl of Klenow large fragment (NEB), 1.6 nl of T4 polynucleotide kinase reaction buffer (NEB), 4 nl of 10 mM deoxynucleotide triphosphates (dNTPs), 2.3 nl of 100 mM ATP, 6.6 nl of 25 mM MgCl2, 5 nl of polyethylene glycol 8000 (PEG8000; 50%, NEB), 1.2 nl of 20 ng ml−1 BSA (NEB), 23.3 nl of 10× polynucleotide kinase reaction buffer (NEB) and 54.2 nl of nuclease-free H2O) for 30 min at 37 °C and heat-inactivated both enzymes for 20 min at 75 °C. To ligate adapters with a T-overhang, we A-tailed the end-repaired genomic DNA fragments with 100 nl of A-tailing mix (0.66 nl of AmpliTaq 360 (ThermoFisher Scientific), 0.66 nl of 100 mM dATP, 16.6 nl of 1 M KCl, 5 nl of PEG8000 (50%, NEB) 0.5 nl of BSA (20 ng/ml, NEB) and 77.2 nl of nuclease-free H2O) for 15 min at 72 °C. Finally, A-tailed fragments were ligated to 50 nl of 5 mM T7 promoter containing adapters15 with cell barcodes and UMI (Supplementary Table 2) using 150 nl of ligation mix (25 nl of T4 DNA ligase (400,000U, NEB), 3.5 nl of MgCl2, 10.5 nl of Tris buffer pH 7.5 (1 M, Gibco), 5.25 nl of dithiothreitol (1 M, ThermoFisher Scientific), 3.5 nl of ATP (100 mM, ThermoFisher Scientific), 10 nl of PEG8000 (50%, NEB), 1 nl of BSA (20 ng ml−1 NEB) and 91.25 nl of nuclease-free H2O) for 20 min at 4 °C followed by 16 h at 16 °C and heat-inactivation for 20 min at 65 °C.

Pooling and purification of EdU fragments

The contents of each plate were collected into VBLOK200 reservoirs precoated with mineral oil (ClickBio) by centrifuging at 300g for 1 min. The aqueous phase was collected and separated from any residual mineral oil by centrifugation. EdU–PEG3–biotin containing DNA molecules was affinity purified using MyOne Streptavidin C1 magnetic beads (Invitrogen) according to the manufacturer’s protocol. Subsequently, we retrieved the complementary strand of the EdU–PEG3–biotin containing the DNA strand by heat denaturation at 95 °C. While ramping down the temperature (0.1 °C s−1) to 20 °C, we annealed an oligo (5′-ATGCCGGTAATACGACTCAC-3′) complimentary to the constant adapter sequence region in oligo annealing buffer (20 mM Tris pH 8, 1 mM MgCl2, 100 mM NaCl). Next, we extended the primer to generate double-strand DNA using Klenow large fragment mix (1× NEB Buffer 2, 50 mM dNTPs, 0.5 U of Klenow large fragment) for 45 min at 25 °C, and heat-inactivation for 20 min at 75 °C. DNA fragments were purified with Ampure XP beads (Beckman Coulter) at a sample to beads ratio 1:1 and resuspended in 7 μl of nuclease-free H2O.

Library amplification by in-vitro transcription and PCR

Preamplified libraries were linearly amplified using a MEGAscript T7 Transcription Kit (ThermoFisher Scientific) for 12 h at 37 °C. Template DNA was removed by the addition of 2 µl of TurboDNAse (ThermoFisher Scientific) for 15 min at 37 °C. Amplified RNA (aRNA) was fragmented for 2 min at 94 °C with fragmentation buffer (5× concentrated; 200 mM Tris-acetate pH 8.1, 500 mM KOAc, 150 mM MgOAc). aRNA was directly cooled to 4 °C on ice and 50 mM EDTA was added to stop fragmentation. The fragmented RNA is purified using RNA Clean XP beads (Beckman Coulter) at a beads to sample ratio of 1:1, and eluted in 12 μl of H2O. Next, 5 μl of aRNA was converted to cDNA by RT in two steps. First, the RNA was primed for RT by adding 0.5 µl of dNTPs (10 mM) and 1 µl of random hexamer RT primer 20 µM (5′-GCCTTGGCACCCGAGAATTCCANNNNNN-3′) at 65 °C for 5 min followed by direct cooling on ice. Second, RT was performed by the addition of 2 µl of first-strand buffer, 1 µl of 0.1 M dithiothreitol, 0.5 µl of RNAseOUT and 0.5 µl of Superscript II, and incubating the mixture at 25 °C for 10 min, followed by 60 min at 42 °C and 20 min at 70 °C. Single-strand cDNA was purified from aRNA through incubation with 0.5 µl of RNAseA (ThermoFisher Scientific) for 30 min at 37 °C. Finally, cDNA was amplified by PCR, which also attaches the Illumina small RNA barcodes and handles (Supplementary Table 3), by adding 25 µl of NEBNext Ultra II Q5 Master Mix (NEB), 11 µl of H2O and 2 µl of RP1 and RPIx primers (10 µM). DNA fragments were purified twice with Ampure XP beads (Beckman Coulter) at a sample to beads ratio 0.8:1, and were resuspended in 10 μl of nuclease-free H2O. The abundance and quality of the final library were assessed by Qubit and Bioanalyzer.

Sequencing

Libraries were sequenced using v2.5 chemistry on a NextSeq500 or NextSeq2000 (Illumina; NextSeq control software v.2.2.0.4; RTA v.2.4.11) with 100 cycles for read 1 (cell index and UMI) and 100 cycles for read 2 (sample index).

DNA fiber assay

RPE-1 hTERT FUCCI cells were labeled with 250 μM 5-iodo-2-deoxyuridine (IdU) for 20 min and then chased with 2 mM thymidine. Following trypsinization, cells were resuspended in 0.2% BSA–PBS0 and sorted for specified S-phase fractions. Subsequently, RPE-1 hTERT cells were lysed on microscopy slides in lysis buffer (0.5% SDS, 200 mM Tris pH 7.4, 50 mM EDTA). DNA fibers were spread by tilting the slide and were subsequently air dried and fixed in methanol/acetic acid (3:1) for 10 min. For immunolabeling, spreads were treated with 2.5 M HCl for 90 min. IdU was detected by staining with mouse-anti-BrdU (1:250; BD Biosciences, cat. no. 347580) for 1 h and was further incubated with Alexa Fluor 647-conjugated anti-mouse immunoglobulin G (1:500) for 90 min. Images were acquired on a Zeiss Axio Imager Z2 and fiber lengths were scored using ImageJ.

Flow cytometry

Apoptosis analysis was performed using an Annexin-V-APC kit (BioLegend) as described in the manufacturer’s protocol for all treatment conditions and cell lines. Cell-cycle analysis was performed by fixing RPE-1 hTERT FUCCI, RPE-1 hTERT or RPE-1 hTERT XRCC1Δ cells in 70% ethanol and counterstaining with DAPI (10 μg ml−1, 20 min on ice). For nascent RNA labeling, we treated cells with EU (200 μM) for 1 h and fixed the cells in 75% ethanol. For DNA replication labeling, we incubated cells for 30 min with 10 μM EdU. For both EU and EdU labeling, we used the EdU-Click 647 Imaging Kit according to the manufacturer’s protocol. For γH2AX staining, we used the fluorescein isothiocyanate-conjugated γH2AX (1:500, Millipore). panADP-ribose binding reagent (1:1,500; Sigma-Aldrich, cat. no. MABE1016) was used in combination with donkey anti-rabbit immunoglobulin G secondary antibody with different Alexa Fluor conjugations depending on the experimental conditions and cell line used.

Read processing

Processing of raw fastq to count tables was performed using SingleCellMultiOmics v.0.1.25 (https://github.com/BuysDB/SingleCellMultiOmics).

First, fastq files were demultiplexed, which adds UMI, cell, sample and sequencing indices to the header of the fastq. Cell barcodes and UMIs with a hamming distance of 1 were collapsed. Next, the adapter sequences were trimmed from each read with cutadapt. Subsequently, reads were mapped with BWA using the mem function to Ensembl release 97, GRCh38.p12 for Homo sapiens, the bam outputs were sorted with samtools. Mapped reads were subjected to molecule assignment, which generates tags for NlaIII restriction site position and integrates the cell barcode, UMI, library, strand and genomic position of NlaIII restriction site into one tag. This integrated molecule tag allows for deduplication of reads and the generation of long-format tables. These tables were filtered for the presence of a NlaIII restriction site, a mapping quality >30, the molecule has a pair of reads assigned, the molecule is unique and should not have alternative alignment positions in the genome.

Single-cell DNA replication analyses and plotting

All data analysis was done in R using the tidyverse and data.table packages unless otherwise stated.

S-phase ordering

First, cells were filtered by the average counts per 100 kb bin with a lower threshold (single pulse 0.37; double pulse 0.08) and an upper threshold (single pulse 2.72; double pulse 12.18), and deviance of Poisson behavior defined as:

$$\log ({{\mathrm{Coefficient of Variation}}}) > -0.5 {\times} {\log ({\mathrm{{mean}}})+{{\mathrm{threshold}}}}$$

where the threshold was set to 0.1. An exponential mixture model was fitted on the distances between successive reads for each single cell separately using the R package flexmix. Subsequently, reads with a posterior probability of >0.5 for the distances to their first neighbors were used for S-phase ordering. Next, we performed a gaussian kernel smoothing (s.d. of 8333.333) for the remaining reads, after which the pairwise overlap coefficient was calculated between all cells on a per chromosome basis. The overlap coefficient was converted to a distance as follows:

$${{\mathrm{Distance}}}=\min\left(\frac{1}{{{\mathrm{score}}}}-1,\,1000\right)$$

and averaged over chromosomes. The resulting distance was embedded in one dimension using UMAP implemented by the R package umap. This UMAP computation was repeated 100 times and the resulting UMAP axis was converted to a z-score. To prevent flipping of the direction of S-phase progression, runs that had an average Spearman rank correlation of <0.85 with the other runs were discarded. Finally, to determine whether a cell could be placed on the ordering definitively, placings were considered clustered if the smallest distance between two successive placings was <0.1, and cells were kept if the biggest cluster contained at least 80% of the successful runs.

HMM pulse segmentation

To determine the part of the genome that is undergoing replication in a single-pulse experiment, a two-state HMM was used to segment the genome into foreground and background. We used the R package mhsmm52 to fit a hidden semi-Markov model with exponential emission distributions and a gamma sojourn distribution per cell on the distances between neighboring reads (Extended Data Fig. 2a), where each of the chromosomes was used as a separate observation. Reads generated by the same polymerase are expected to be close together on the genome, closer in general than neighboring forks or spurious background reads. Subsequent distances between reads are assigned to the foreground state (Extended Data Fig. 2a, yellow) and background (Extended Data Fig. 2a, red). To get the most likely sequence of states, the viterbi algorithm implemented in the mhsmm package was used then. Thus, reads that were generated by the same polymerase have a higher posterior probability (extracted from the viterbi algorithm, Extended Data Fig. 2b) of being close together on the genome, closer in general than neighboring forks or spurious background reads. Subsequent reads that are assigned to the foreground state by the model are considered a track traveled by a single polymerase. We found that DNA replication fork tracks contain, on average, ~7 scEdU-seq reads (Extended Data Fig. 2c) and the coverage is around 5% of the width of these DNA replication tracks (Extended Data Fig. 2d). Finally, we found that our DNA replication fork calls are, on average, 29,000 bp in length for a 15-min pulse of EdU in RPE-1 TERT cells (Extended Data Fig. 2d) and the majority of DNA replication forks have three or more reads per single cell (Extended Data Fig. 2f).

However, there are caveats with regards to the quantification of the number of forks per cell, which includes undercounting (that is, false negatives) that arise from low sampling of reads from a labeled stretch of EdU. Furthermore, the single-pulse labeling strategy used in this work (15 min of EdU) has a lower limit resolution for the DNA replication tracks (~15–20 kb). Several other methods such as Okazaki fragments sequencing (OK-seq) and fiber analyses enable more fine-grained analysis of processes occurring at higher resolution. For instance, within a single initiation zone (average ~30 kb) multiple initiation sites fire53. Using scEdU-seq, we cannot resolve these individual initiation sites using the current labeling schemes, library preparation and analysis. Finally, several DNA repair processes are known to fill in considerable stretches of DNA, which potentially result in false-positive DNA replication tracks54.

Comparison between Repli-Seq and bulk scEdU-seq

Bulk scEdU-seq samples were generated by collecting 500 early or late S-phase RPE-1 hTERT FUCCI cells treated with EdU for 120 min. Cells were processed similarly to scEdU-seq libraries. We retrieved bulk Repli-Seq for RPE-1 hTERT cells from the 4D Nucleome program, which was generated by the Gilbert lab. Subsequently, the samples were binned with a 50-kb resolution and reads per bin were z-scored per sample. Z-scores were used for comparative plotting of traces as well as computing the Spearman correlation between samples.

Comparison between Repli-Seq and scEdU-seq

Segmented pulses (‘HMM pulse segmentation’) from the single-pulse dataset were overlapped with 10-kb bins from the Repli-Seq dataset using the foverlaps function from data.table. The Repli-Seq RT was calculated as follows:

$${{\mathrm{RT}}\; {\mathrm{score}}}=\frac{{{\mathrm{early}}}-{{\mathrm{late}}}}{{{\mathrm{early}}}+{{\mathrm{late}}}}$$

The Pearson correlation was then calculated between binned S-phase progression and the Repli-Seq RT score.

EdUseq-HU comparison

Raw early DNA replication origin data (EdUseq-HU) for RPE-1 hTERT was used from ref. 19 (BioProject PRJNA397123). Raw FASTQ files were trimmed and mapped to Ensembl release 97, GRCh38.p12 with BWA. To compare scEdU-seq and EdUseq-HU both datasets were binned at 100-kb resolution. Next, the cumulative number of reads over S-phase progression per bin was calculated, and was used to calculate the binwise Pearson correlation per cumulative reads over S phase.

DNA replication track heatmaps

To visualize DNA replication tracks, we used a heatmap to plot scEdU-seq signal per single cell over S-phase progression. First, we ordered single cells based on their S-phase progression (‘S-phase ordering’). Next, we selected a segment of the genome for visualization of DNA replication tracks containing both early and late replication domains (for example, chromosome 2 from 10 to 50 Mb). Subsequently, we normalized the scEdU-seq signal in 50-kb bins to the maximum value observed for that chromosome (that is, 0–1).

Simulation of single- versus double-pulse speed quantification

The number of reads per pulse was drawn from a Poisson distribution of varying intensity to simulate average sampling depth per replication track. Subsequently, these reads were placed at locations from a uniform distribution within a replication track (that is, 0 or 1 for a single pulse and 2 or 3 for a double pulse). Placement of the reads by the uniform distribution resulted in a ground truth DNA replication speed of 1 kb min−1 for both a single and a double pulse. One thousand pulses were generated per intensity value. Finally, the DNA replication speed was estimated (Extended Data Fig. 3b,c) for the single-pulse speed (Methods ‘Single-pulse DNA replication speed estimate’) and double pulse (the average of the pairwise distances between the first and second pulse).

Pair correlation

The pair correlation was calculated as the pairwise distances between all reads in one cell per chromosome. For display, the count was calculated per 5-kb bin, distances >400 kb (except Extended Data Fig. 5 for which the max was 1 Mb) were discarded and the total counts were either sum normalized or range-scaled between 0 and 1.

Single-pulse DNA replication speed estimate

The width of the HMM-segmented pulses (‘HMM pulse segmentation’) can be calculated from the genomic coordinates of the first and last read in a pulse. However, because of sampling, this width is likely to be an underestimation of the actual traveled path of the polymerase. The pulse width was corrected for sampling similarly to the estimation of the maximum of a sampled uniform distribution, as follows:

$${{\mathrm{Pulse}}\; {\mathrm{width}}}=w+w/(n-1)$$

Where w is the genomic distance between the first and the last read in the pulse and n is the number of reads in the pulse.

Mixture model fits

A mixture model with four components (uniform, exponential, halve-normal and normal distribution) was fitted per cell using a custom expectation maximization algorithm with soft labels written in C++ and implemented using the R package Rcpp55. In the maximization step the parameters of the component distributions were updated using the weighted mean for the exponential and normal distribution, and the weighted variance for the half-normal and normal distribution. In addition, the mean of the exponential component was restricted to >1,000, and the previous probability of the exponential component was restricted to >0.01. The algorithm was run until a relative tolerance of 10−8 or a maximum of 100 iterations was reached.

Pair correlation simulation

In the case of sampling with equal intensity, to simulate the pair correlation, first the read locations were drawn from a uniform distribution with a minimum of −200 and a maximum of 200, and the number of reads was drawn from a Poisson distribution with a mean of 80. Reads, r, were then retained with the following logic:

$$-45\,{\times} s < r < -30{\times} s\wedge 30{\times} s < r < 45{\times} s\,$$

where the scaling s of the window was drawn from a truncated (at 0) normal distribution with a given mean and variance.

$$s\, \sim \,N(\,\mu ,\sigma {\rm{|}}x > 0)$$

In the case of unequal sampling intensity, first the double window was defined and scaled with a speed factor drawn from a normal distribution (truncated at 0) with a given mean and variance. The read locations were then drawn from a uniform distribution where the minimum and maximum were scaled according to the speed factor, and the number of reads was again drawn from a Poisson distribution with a mean of 80. Reads falling outside the earlier defined window were again discarded. For both scenarios this was done 2,000 times for every combination of mean and variance of the ground truth speed distribution, after which the pair correlation was calculated as described previously. Subsequently, fitted speed estimates were corrected for Poisson sampling artifacts by fitting a locally estimated scatterplot smoothing smoothing surface to the simulated data. After which, we predict the input mean or standard deviation parameter using the output mean and standard deviation. Finally, this model predicts the ground truth mean and standard deviation from the measured experimental mean and standard deviation.

scRNA-seq data analysis

Count tables from ref. 31 were filtered for the EU-labeled fraction of messenger RNA, and cells were assigned to S phase if the cell-cycle progression score was >0.333 or <0.75. The total counts per gene for the S-phase pseudo-bulk were then transformed by adding 1, log10-transformed and rounded.

$${{\mathrm{Transformed}}\; {\mathrm{count}}}={{\mathrm{round}}}({\log}_{10}({{\mathrm{counts}}}+1))$$

The genomic location was added to the genes using the hg38 Ensembl release 106. In the case of overlapping genes, the one with the higher count was given priority for the overlapping portion. Segmented pulses from the 15-min EdU dataset were overlapped with the expressed genes using the foverlaps function from data.table.

Chromatin immunocleavage sequencing

RPE-1 hTERT cells were processed similarly to in ref. 15. The following antibodies were used: anti-H3K9Me3 (1:100; Abcam, cat. no. ab8898), anti-H3K27Me3 (1:200; Cell Signaling Technologies, cat. no. C36B11) and anti-H3K36Me3 (1:2,000; ThermoFisher Scientific, cat. no. MA5-24687).

DNA replication speed comparison

Pair correlation distances from Δt = 75 min were overlapped with expressed genes from single cell 5-ethynyl-uridine sequencing (scEU-seq) on RPE-1 cells using the foverlaps function from data.table. Pair correlation distances were split between transcribed and nontranscribed regions and given weights depending on the fraction of distance overlap with the specific gene. Subsequently, these distances were weighted by the posterior estimate for the speed component, which we derived from the pair correlation mixture model. Finally, we plotted the maximum normalized density of the weighted speed derived distances for transcribed and nontranscribed regions. Statistical testing was performed by resampling from the density distribution and significance was determined by Student’s t-test.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.