Main

Correct chromosome segregation relies on a unique chromatin domain known as the centromere. Human centromeres are located on megabase-long1 chromosomal regions and are comprised of tandemly repeated arrays of an approximately 171 base pair (bp) element, termed α-satellite DNA2,3,4. CENP-A is a histone H3 variant5,6 that replaces histone H3 in chromatin assembled onto about 3% of α-satellite DNA repeats7,8, and is flanked by pericentric heterochromatin containing H3K9me2/3 (ref. 9). Nevertheless, α-satellite DNA sequences are neither sufficient nor essential for centromere identity2,10, as demonstrated by several measures including identification of multiple examples of acquisition of a new centromere (referred to as a neocentromere) at a new location coupled with inactivation of the original centromere11.

This has led to a consensus view that mammalian centromeres are defined by an epigenetic mark2. Use of gene replacement in human cells and fission yeast has identified the mark to be CENP-A-containing chromatin12, which maintains and propagates centromere function indefinitely by recruiting CENP-C and the 16-subunit constitutive centromere associated network (CCAN)13,14,15,16. We8 and others17 have shown that the overwhelming majority of human CENP-A chromatin particles are octameric nucleosomes containing two molecules of CENP-A at all cell cycle points, with heterotypic CENP-A/histone H3-containing nucleosomes comprising at most 2% of CENP-A-containing chromatin8.

During DNA replication, initially bound CENP-A is quantitatively redistributed to each daughter centromere18, while incorporation of new CENP-A into chromatin occurs only after exit from mitosis18,19, when its loading chaperone HJURP (ref. 20,21) is active22. Temporal separation of centromeric DNA replication from new CENP-A chromatin assembly raises the important question of how the centromeric epigenetic mark is maintained across the cell cycle, when it would be expected to be dislodged by DNA replication and diluted at each centromere as no new CENP-A is assembled until the next G1 (ref. 18). Moreover, endogenous CENP-A comprises only about 0.1% of the total histone H3 variants. Recognizing that a proportion of CENP-A is assembled at the centromeres with the remainder loaded onto sites on the chromosome arms7,8,23, long-term maintenance of centromere identity and function requires limiting accumulation of non-centromeric CENP-A. Indeed, artificially increasing CENP-A expression in human cells increases ectopic deposition at non-centromeric sites, accompanied by chromosome segregation aberrations23,24,25,26.

Using CENP-A chromatin immunoprecipitation (ChIP) and mapping onto centromere reference models for the centromere of each human chromosome, we now establish that DNA synthesis acts as an error correction mechanism to maintain epigenetically defined centromere identity by mediating precise reassembly of centromere-bound CENP-A chromatin, while removing ectopically loaded CENP-A found within transcriptionally active chromatin outside the centromeres.

Results

CENP-A binding at 23 human centromere reference models

We produced HeLa cells either (1) expressing CENP-ALAP, a CENP-A variant carboxy-terminally fused to a localization (enhanced yellow fluorescent protein, EYFP) and affinity (His) purification tag27 at one endogenous CENP-A allele (Supplementary Fig. 1a and Fig. 1a), or (2) stably expressing an elevated level (4.5 times the level of CENP-A in parental cells) of CENP-ATAP, a CENP-A fusion with carboxy-terminal tandem affinity purification (S protein and protein A) tags separated by a tobacco etch virus protease cleavage site (Supplementary Fig. 1b and Fig. 1a). Both CENP-A variants localize to centromeres (Supplementary Fig. 1c,d), support long-term centromere identity, and mediate high-fidelity chromosome segregation in the absence of wild-type CENP-A (refs. 7,8). Importantly, while the HeLa cells we use have acquired an aneuploid genome, they are chromosomally stable, with a high-fidelity centromere function that has maintained the same karyotype over almost two decades28.

Fig. 1: CENP-A ChIP-seq identifies CENP-A binding at reference centromeres of 23 human chromosomes.
figure 1

(a) CENP-A ChIP-seq experimental design. TEV, tobacco etch virus; qPCR, quantitative real-time PCR. (b) Quantitative real-time PCR for α-satellite DNA in chromosomes 1, 3, 5, 10, 12 and 16. N = 2 for CENP-ATAP and 3 for CENP-ALAP, from biologically independent replicates. Error bars, s.e.m. (c) MNase digestion profile showing the nucleosomal DNA length distributions of bulk input mononucleosomes (upper panel) and purified CENP-ALAP following native ChIP at G1 and G2. The experiment was repeated independently twice with similar results. (d) Number of CENP-A binding peaks for α-satellite DNA in CENP-ATAP and CENP-A+/LAP cells at G1 and G2. The numbers represent peaks that overlap between the two biologically independent replicates. (e) CENP-A ChIP-seq shows CENP-A binding peaks at the centromere of chromosome 18 for CENP-ALAP and CENP-ATAP before and after DNA replication. CENP-A peaks across the reference model are a result of multimapping and their exact linear order is not known. SICER peaks are shown in black below the raw read data. Y axis shows read counts. Two replicates are shown for each condition. Source data for b and d can be found in Supplementary Table 4.

To immunopurify CENP-A-containing chromatin at G1 or G2, chromatin was isolated from synchronized cells (Fig. 1a and Supplementary Fig. 1e). Nuclease digestion was used to produce mononucleosomes from chromatin isolated at G1 and G2, yielding the expected 147 bp of protected DNA length for nucleosomes assembled with histone H3 (Fig. 1a,c, upper panel). In parallel, chromatin was also isolated from randomly cycling cells stably expressing TAP-tagged histone H3.1 (H3.1TAP—Supplementary Fig. 1b and Fig. 1a)13. CENP-ALAP-, CENP-ATAP- or H3.1TAP-containing chromatin was then affinity purified and eluted under mild conditions with PreScission or tobacco etch virus protease cleavage (Fig. 1a).

α-satellite DNA sequences were enriched 30–35-fold in DNA isolated from CENP-ATAP or CENP-A+/LAP cells (Fig. 1b), the expected enrichment since α-satellite DNA comprises about 3% of the genome8,29. While microcapillary electrophoresis of bulk input chromatin produced the expected 147 bp of protected DNA length for nucleosomes assembled with histone H3 (Fig. 1c, upper panel), isolated CENP-ALAP chromatin expressed at endogenous CENP-A levels produced DNA lengths centred on 133 bp, before and after DNA replication (Fig. 1c, lower panel), as previously reported for octameric CENP-A-containing nucleosomes with DNA unwinding at entry and exit8,30.

Affinity-purified CENP-ALAP-, CENP-ATAP- and H3.1TAP-bound DNAs were sequenced and mapped (Fig. 1a,d and Supplementary Table 1) onto the centromere reference model of the human X chromosome31 and centromere models (incorporated into the HuRef genome hg38) for each human autosome that include the observed variation in α-satellite higher-order repeat (HOR) array sequences32,33. Sequences bound by CENP-A were identified (Fig. 1d,e and Supplementary Fig. 2) using algorithm-based scripts (SICER and MACS, refs. 34,35). CENP-ALAP expressed at endogenous levels mapped with high reproducibility across the centromeric regions of all 23 reference centromeres (see Fig. 1e for chromosome 18 and Supplementary Fig. 2 for the other 22). Sequences bound were largely unaffected by increasing CENP-A levels 4.5-fold in CENP-ATAP cells (Figs. 1e and 2a,b).

Fig. 2: Retention of centromeric CENP-A through DNA replication.
figure 2

(a,b) CENP-A ChIP-seq raw mapping data (coloured) and SICER peaks (black lines, underneath) showing sequences bound by CENP-A (at both endogenous and increased expression levels) across the centromere reference model of chromosome 8, before and after DNA replication. Upper part, mapping of all reads (including reads that are multimapping) onto the repetitive α-satellite DNA. Lower part, read mapping to sites that are single copy in the HuRef genome (single mapping), after filtering out multimapping reads. Scale bar, 500 kb (a), 10 kb (b). (c) High-resolution view of read mapping to a site that is single copy in the centromere reference model of chromosome 8, marked by a purple bar in a. (d) High-resolution view of read mapping to a site that is single copy in the centromere reference model of chromosome 8, marked by a green bar in a. (e,f) High-resolution view of read mapping to a site that is single copy in the centromere reference model of chromosome 2 (e) and in the centromere reference model of chromosome 10 (f). The experiment shown in af was repeated independently twice with similar results.

Centromeric α-satellite arrays varied widely in CENP-A binding, from 10.5-fold enrichment for array D3Z1 in cen3 to 213-fold for array GJ211930.1 in cen10, and with most enriched 20–40-fold relative to input DNAs (Supplementary Table 2). For the 6 (of 17) centromeres that contain more than one α-satellite array, only one array actively bound CENP-A (Supplementary Table 2). Multiple α-satellite arrays in 11 centromeres (Supplementary Table 2) showed enriched CENP-A binding in two or more arrays, consistent with CENP-A binding to a different array in each homologue, as previously shown for cen17 in two diploid human cell lines36. The increased levels of CENP-A in CENP-ATAP cells did not increase the number of centromeric binding peaks (Fig. 1d,e), but did elevate CENP-A occupancy at some divergent monomeric α-satellite repeats (Supplementary Fig. 1f) or within HORs (Supplementary Fig. 1g), with both examples occurring in regions with few CENP-B boxes.

CENP-A nucleosomes are retained at centromeric loading sites after DNA replication

Despite the known redistribution of initially centromere-bound CENP-A onto each of the new daughter centromeres without addition of new CENP-A (ref. 18), comparison of the sequences bound by CENP-A in G1 with those bound in G2 revealed that for all 23 centromeres, at both normal (CENP-A+/LAP) and elevated (CENP-ATAP) levels, CENP-A was bound to indistinguishable α-satellite sequences before and after DNA replication (Fig. 1e and Supplementary Fig. 2). Indeed, almost all (87%) of α-satellite binding peaks algorithmically identified for CENP-ATAP during G1 remained at G2 (Supplementary Fig. 1h, top). A similarly high (89%) retention of CENP-A peaks found in G1 remained at G2 in CENP-ALAP cells with CENP-A expressed at endogenous levels (Supplementary Fig. 1h, bottom). After filtering out multimapping reads, 96 single-copy, centromeric CENP-A binding sites were identified within the HORs of the 23 reference centromeres (Fig. 2). Remarkably, examination of these before and after DNA replication in CENP-ATAP cells revealed quantitative retention of CENP-A in G2 in almost all (93 of 96) of these unique centromeric sites, with the remaining three peaks only slightly diminished (Fig. 2 and Supplementary Fig. 1i).

Ectopic CENP-A assembled onto chromosome arms in early G1 is removed by G2

In addition to the striking enrichment at centromeric α-satellites, genome-wide mapping of CENP-A-bound DNAs revealed preferential and highly reproducible incorporation into unique sequences, non-α-satellite sites on the arms of all 23 chromosomes (Figs. 3a,b and 4). At endogenous CENP-A levels, 11,390 ectopic sites were identified, 620 of which were loaded ≥5-fold over background (Fig. 3d; Supplementary Fig. 3a). Sites enriched for bound CENP-A were essentially identical in DNAs from randomly cycling cells or G1 cells (Fig. 3a,b). While a 4.5-fold increase in CENP-A levels in CENP-ATAP cells did not increase the binding peaks (Fig. 1d) or the number of unique single-copy sites within centromeric HORs (Supplementary Fig. 1i, bottom), it drove an increased number of sites of CENP-A incorporation on the arms, producing 40,279 non-centromeric sites (Fig. 3d), 12,550 of which were loaded ≥5-fold over background (Supplementary Fig. 3a).

Fig. 3: Sites of CENP-A assembly onto chromosome arms in early G1 are removed by G2.
figure 3

(a) ChIP-seq raw mapping data (coloured) and SICER peaks (black lines, underneath) showing sequences bound by CENP-A (at both endogenous and increased expression levels) across chromosome 4 before and after DNA replication. Read counts were scaled to 30 but reached 150 at the centromere. (b) ChIP-seq data for a region within the p-arm of chromosome 4, with two replicates for each time point, for CENP-ATAP (increased CENP-A expression), CENP-ALAP (endogenous level) and H3.1TAP. (c) High-resolution nucleosomal view of CENP-ATAP mapping data at G1 and G2 at a non-centromeric site of chromosome 4. The experiments in ac were repeated independently twice with similar results. (d) Total number of non-α-satellite CENP-A binding sites for CENP-ATAP and CENP-ALAP at G1 and G2. The numbers represent the average of the two sequencing replicates per time point. (e) Quantitative real-time PCR following CENP-A ChIP from DLD1 cells with auxin-degradable CENP-AAID and a doxycycline-inducible CENP-AWT (ref. 37) after synchronization in G1 or in mitosis (as shown in Supplementary Fig. 3b) for sites on the arms of chromosomes 1, 5, 9 and 14. Sites for quantitative real-time PCR were chosen based on identification of ectopic deposition of CENP-A at these locations in HeLa cells. Levels of CENP-A enrichment at mitosis were normalized to the level of enrichment at G1. Results of two independent experiments for G1 and three for mitosis are shown. Source data for d and e can be found in Supplementary Table 4.

Fig. 4: Ectopic CENP-A is removed following DNA replication from the arms of all 23 human chromosomes.
figure 4

CENP-ATAP ChIP-seq raw mapping data at G1 and G2 for all human chromosomes. Chromosome X shows a spike of CENP-A enrichment not removed by G2 (marked by an asterisk). The experiment was repeated independently twice with similar results.

Remarkably, for all 23 human chromosomes and for CENP-A accumulated to endogenous (CENP-ALAP) or increased (CENP-ATAP) expression levels, passage from G1 to G2 almost eliminated enrichment of CENP-A binding to specific sites on the chromosome arms, while leaving α-satellite-bound sequences unaffected (Figs. 1d, 3 and 4). Loss by G2 of CENP-A binding at specific arm sites was highly reproducible (see experimental replicas in Fig. 3b). Scoring peak binding sites with thresholds of ≥5-, 10- or 100-fold of CENP-A binding over background, at least 90% of sites bound on chromosome arms in G1 in CENP-ATAP cells were removed by early G2 (Supplementary Fig. 3a), and all of those still identified in G2 were substantially reduced in peak height.

Ectopic CENP-A removal after DNA replication was confirmed using CENP-A ChIP following synchronization in G1 or mitosis (Supplementary Fig. 3b) in a second, nearly diploid human cell line (DLD1) in which the two CENP-A alleles were modified to encode a degron-tagged, auxin-inducible degradable CENP-AAID or a doxycycline-inducible CENP-AWT (ref. 37). Levels of CENP-A loaded at each of four ectopic sites before (G1) and after DNA replication revealed that almost all (85–90%) of ectopic CENP-A loaded in G1 was removed by mitotic entry (Fig. 3e).

Neocentromeres are not at sites of ectopic CENP-A loading

Recognizing that ectopic loading of CENP-A on chromosomes could be one component of neocentromere formation, we tested if the positions of known human neocentromeres38 are sites of preferential ectopic CENP-A loading. Despite cytogenetic positioning of many reported neocentromeres11, only two have been precisely mapped39. The first (PDNC4) spans 300 kilobases (kb) on chromosome 4 (ref. 40) (87.278 to 87.578 megabases (Mb) in hg38). In CENP-ATAP cells, only four sites of elevated CENP-A were present in G1 within the genomic region of this neocentromere, only two of which had CENP-A loading more than fivefold over the background. Importantly, all sites were removed in G2-derived chromatin (Fig. 5a). An additional neocentromere (IMS13q) maps to a 100 kb region on chromosome 13 (ref. 41) (97.047 to 97.147 Mb in hg38 assembly39). CENP-A binding to this region in CENP-ATAP cells in G1 did not differ in density of peaks or peak heights from many similarly loaded sites scattered across the long arm of chromosome 13 (Supplementary Fig. 3c). Passage from G1 to G2 of CENP-ATAP cells stripped almost all CENP-A bound that corresponded to the region of the IMS13q neocentromere. While we cannot exclude the possibility that other cell types have different epigenetic landscapes that affect the sites to which CENP-A binds at non-centromeric regions, our examination of these two best defined neocentromeres offers no support for neocentromere formation arising at the site of an inherent hotspot of ectopic CENP-A loading.

Fig. 5: Neocentromeres are not positioned at sites of preferential CENP-A loading on chromosome arms.
figure 5

(a) Read mapping data of CENP-ATAP ChIP-seq at G1 and G2, at the chromosomal location of a known patient-derived neocentromere39 found in chromosome 4. The experiment was repeated independently twice with similar results. (b) More than 80% of CENP-A SICER peaks (≥5-fold over background) in randomly cycling and G1 cells overlap with DNase I hypersensitive sites taken from the ENCODE project. (c) Example from the chromosome 4 p-arm showing overlap of at least 100 bases between SICER peaks (≥5-fold over background) and HeLa S3 DNase I hypersensitive sites taken from the ENCODE project. The experiment was repeated independently twice with similar results. (d,e) CENP-ATAP (d) or CENP-ALAP (e) enrichment levels at DNase I hypersensitive sites, H3K4me1 and H3K4me2 sites. SICER peaks (≥5-fold over background) supported between the two replicates were analysed for their enrichment level at DNase I hypersensitive sites, with minimum overlap of 100 bases, compared with the level of enrichment at these sites by chance. SICER peaks (≥5-fold over background) supported between the two replicates of CENP-ATAP (d) or CENP-ALAP (e) were analysed for their degree of overlap with known sites of H3K4me1 and H3K4me2 binding taken from ENCODE HeLa S3 datasets. Source data for d and e can be found in Supplementary Table 4.

CENP-A is ectopically loaded at early G1 into open/active chromatin

The sites on the chromosome arms into which CENP-A was assembled in G1 in CENP-ATAP cells were enriched twofold (compared with levels expected by chance) at promoters or enhancers of expressed genes, with a 2.5-fold enrichment at sites bound by the transcriptional repressor CTCF (Supplementary Fig. 3e), trends similar to previous reports for cells with increased CENP-A23. More than 80% of CENP-ATAP binding sites on chromosome arms with peak heights ≥5-fold over background (Fig. 5b–d) overlapped with DNase I hypersensitive, accessible chromatin sites identified by the ENCODE project that are functionally related to transcriptional activity. Similarly, CENP-A expressed at endogenous levels was enriched threefold at DNase I hypersensitive sites (Fig. 5e) and promoters (Supplementary Fig. 3g).

Most (80%) of non-centromeric CENP-ATAP binding peaks overlapped with H3k4me1 and H3k4me2 sites found in active and primed enhancers and at transcription factor binding sites identified by ENCODE, with a similar trend for CENP-ALAP (Fig. 5d,e). Ectopic CENP-ATAP or CENP-ALAP (Supplementary Fig. 4a,b) peaks also showed a significant overlap with other marks of active transcription, including H2A.Z, H3K4me3, H3K27ac, H3K36me3 and H3K9ac. Conversely, both CENP-ATAP and CENP-ALAP were not enriched at H3k27me3 sites tightly associated with inactive gene promoters and facultatively repressed genes (Supplementary Figs. 3d,f and 4a,b). CENP-A binding peaks showed a mild (30–40%) overlap with histone modifications H4K20me1 and H3K79me2 (active transcription marks) or the H3K9me3 mark of transcription repression (Supplementary Fig. 4a,b). Overall, most (65% and 93%, respectively) of the ectopic CENP-A sites in cells with endogenous or elevated CENP-A were associated with any active transcription mark (Supplementary Fig. 4a,b), consistent with ectopic CENP-A preferentially bound to open, active chromatin.

Ectopic CENP-A in G1 in either CENP-ATAP or CENP-ALAP cells bound to ‘high-occupancy target’ regions defined by highly expressed regions of the genome (and that show binding of unrelated transcription factors without underlying sequence specificity42) were almost quantitatively removed from cells that had progressed through DNA replication (Supplementary Fig. 3h,i), demonstrating that enrichment of CENP-A in such highly expressed regions cannot be a consequence of non-specific binding to ‘hyper-ChIPable’ regions43.

Analysis of published CENP-A ChIP sequencing (ChIP-seq) datasets for HT1080 cells44, a human epithelial fibrosarcoma cell line expressing Flag-tagged CENP-A at about three times the level of parental CENP-A, revealed similar trends: ectopic sites were enriched at DNase I hypersensitive sites and at transcription activation marks (H2A.Z and H3K4me1/2) but were not enriched at the transcription repression mark H3K27me3 (Supplementary Fig. 4c). Similarly, CENP-A ChIP-seq datasets from the HuRef human lymphoblastoid cell line45 also revealed that the majority (51%) of ectopic CENP-A accumulated to its endogenous level was found at marks associated with transcription activation (including H2A.Z, H3K36me3 and H4K20me1). CENP-A was not enriched at the transcription repression mark H3K27me3 (Supplementary Fig. 4d), although in all three cell lines analysed ectopic CENP-A was enriched at H3K9me3 sites associated with heterochromatin of constitutively repressed genes (Supplementary Fig. 4a–d).

Ectopic, but not centromeric, CENP-A is removed by replication fork progression

We next tested whether removal by G2 of CENP-A assembled into nucleosomes at unique sites on the chromosome arms is mediated by the direct action of the DNA replication machinery. CENP-ATAP was affinity purified from mid-S-phase cells and CENP-A-bound DNAs were sequenced and mapped (Fig. 6a and Supplementary Fig. 5a). In parallel, newly synthesized DNA in synchronized cells was labelled by addition of bromodeoxyuridine (BrdU) for 1 h at early (S0–S1), mid- (S3–S4) and late S phase (S6–S7) (Fig. 6a and Supplementary Fig. 5a). Genomic DNA from each time was sonicated (Supplementary Fig. 5b) and immunoprecipitated with a BrdU antibody (Fig. 6a). Eluted DNA was then sequenced and mapped to the genome (an approach known as Repli-seq46), yielding regions of early-, mid- and late-replicating chromatin (an example from a region of an arm of chromosome 20 is shown in Fig. 6b). Early replication timing was validated (Supplementary Fig. 5c) for two genes (MRGPRE and MMP15) previously reported to be early replicating (ENCODE Repli-seq46). Similarly, a gene (HBE1) and a centromeric region (Sat2) previously reported to be late replicating (ENCODE Repli-seq46) were confirmed to be replicated late (Supplementary Fig. 5d).

Fig. 6: Ectopic CENP-A is removed contemporaneously with replication fork progression, while centromeric CENP-A is retained.
figure 6

(a) Experimental design of CENP-A ChIP-seq combined with Repli-seq. (b) Raw mapping data of CENP-ATAP ChIP-seq at G1 and BrdU Repli-seq at early S, mid-S and late S/G2 at the q-arm of chromosome 20. (c) Percentage of ectopic G1 CENP-ATAP ≥5-fold binding sites in early-, mid- or late-S-replicating regions. (d) Percentage of BrdU SICER peaks in α-satellite DNA found within early-, mid- or late-S-replicating regions. (e) CENP-A ChIP-seq raw mapping data at cen18 at G1, mid-S phase and G2, and BrdU Repli-seq at early S, mid-S and late S/G2. (f) High-resolution view of CENP-A mapping during DNA replication (mid-S) at a single-copy variant (marked by a purple bar in Fig. 2a) in the centromere reference model of chromosome 8. Data from Fig. 2c for CENP-ATAP G1 and G2 are included for comparison. (g,i,k) Raw mapping data and SICER peaks of CENP-ATAP ChIP-seq at G1, mid-S phase and G2, and BrdU-labelled Repli-seq samples showing regions going through replication in early S, mid-S and late S/G2 phase within the p-arm of chromosome 4 (g), the q-arm of chromosome 10 (i) and the p-arm of chromosome X. (h) Number of early replicating CENP-ATAP G1 peaks (≥5-fold over background) retained at mid-S phase. (j) Number of mid-S-replicating CENP-ATAP G1 peaks (≥5-fold over background) retained at G2. (l) Number of late-replicating CENP-ATAP G1 peaks (≥5-fold over background) retained at G2. Source data for c,d,h,j,l can be found in Supplementary Table 4. Data shown in bl are from two biologically independent experiments.

CENP-A chromatin immunoprecipitated from early- and mid-S-phase cells yielded levels of α-satellite DNA enrichment (Supplementary Fig. 5e) similar to those achieved at G1 phase (Fig. 1b). Furthermore, nucleosomal CENP-A chromatin produced by micrococcal nuclease digestion protected 133 bp of DNA at early and mid-S phase (Supplementary Fig. 5f), just as it did in G1 and G2 (Fig. 1c; see also ref. 8), with no evidence for a structural change from hemisomes to nucleosomes and back to hemisomes during S phase as previously claimed47. Mapping of CENP-A binding sites on chromosome arms, combined with Repli-seq, revealed that almost all (91%) ectopic G1 CENP-A binding was found in early- or mid-S-replicating regions (Fig. 6b,c). While alphoid DNA sequences have been reported to replicate in mid-to-late48 S phase, in our cells α-satellite-containing DNAs in all 23 centromeres were found almost exclusively to be late replicating (Fig. 6d).

Remarkably, throughout S phase, centromere-bound CENP-A found in G1 was completely retained across each reference centromere with the same sequence binding preferences (Fig. 6e and Supplementary Fig. 5g). Retention of CENP-A binding during DNA replication was also observed at the unique-sequence-binding sites within the HORs of each centromere: all 96 CENP-ATAP G1 peaks at single-copy variants within α-satellite HORs remained bound by CENP-A (Fig. 6f and Supplementary Fig. 5h). In contrast, early-replicating ectopic CENP-A-binding sites were nearly quantitatively removed during or quickly after their replication and were no longer visible in mid-S phase (Fig. 6g,h). Similarly, ectopic CENP-A-binding sites found in mid-S-replicating regions remained at mid-S but were removed quickly after that and were absent by late S/G2 (Fig. 6i,j). For the 10% of ectopic CENP-A G1 peaks in late-S-replicating regions (Fig. 6c), almost all (85%) were removed by G2 (Fig. 6k,l), while late-replicating centromeric CENP-A peaks were retained, including the single-copy variants within the α-satellite HORs (Figs. 6d–f and 2 and Supplementary Fig. 5g,h). Thus, ectopic, but not centromeric, CENP-A-binding sites are removed as DNA replication progresses.

CENP-C/CCAN remain centromeric CENP-A associated during DNA replication

To comprehensively determine the components that associate with CENP-A chromatin during replication in late S, we used mass spectrometry following affinity purification of CENP-A nucleosomes (Supplementary Fig. 5i, left). A structural link that normally bridges multiple centromeric CENP-A nucleosomes and nucleates the full kinetochore assembly before mitotic entry is the 16-subunit constitutive centromere associated network (CCAN)49,50,51,52. This complex is anchored to CENP-A primarily through CENP-C50,53,54 and sustained by CENP-B binding to CENP-B box sequences within α-satellite DNAs55. Remarkably, mass spectrometry identified that all 16 CCAN components13,15 remained associated with mononucleosomal CENP-A chromatin that was affinity purified from late S/G2 (Table 1). Stable association with CENP-A was also seen for HJURP, multiple chromatin remodelling factors and nuclear chaperones, histones, centromere and kinetochore components, and other DNA replication proteins (Supplementary Fig. 5j–n). The continuing interaction during DNA replication of CCAN proteins with CENP-A, which is maintained even on mononucleosomes, provides strong experimental support that the CCAN complex tethers CCAN-bound centromeric CENP-A at or near the centromeric DNA replication forks, thereby enabling its efficient reincorporation after replication fork passage.

Table 1 Mass spectrometry of CENP-A individual nucleosomes reveals all the CCAN network components coprecipitated with CENP-A at late S/G2

To test this further, the composition of CENP-A-containing nucleosomal complexes from G1 to late S/G2 was determined following affinity purification (via the TAP tag) of chromatin-bound CENP-ATAP from a predominantly mononucleosome pool (Supplementary Fig. 5i, right). We initially focused on the chromatin assembly factor 1 (CAF1) complex, which is required for de novo chromatin assembly following DNA replication56. Its p48 subunit (also known as CAF1 subunit c, RbAp48 or RBBP4) binds histone H4 (ref. 57), is a binding partner in a CENP-A prenucleosomal complex with HJURP and nucleophosmin (NPM1)21 and maintains the deacetylated state of histones in the central core of centromeres after deposition58. Remarkably, CAF1 p48 co-immunopurified with CENP-A from G1 to late S/G2 (Fig. 7a). In striking contrast, the two other CAF1 subunits (CAF1 p150 and CAF1 p60), which are essential for de novo chromatin assembly in vitro59, remained much more strongly associated with CENP-A nucleosomes in late S/G2 compared with mid-S (Fig. 7a). Additionally, MCM2, a core subunit of the DNA replicative helicase MCM2-7 complex that recycles old histones as the replication fork advances60, was robustly copurified with CENP-A only in late-S-phase-derived chromatin, with no association detected in mid-S (Fig. 7a).

Fig. 7: CENP-C and the CCAN complex are essential for the epigenetic maintenance of human centromeres during DNA replication.
figure 7

(a) CENP-ATAP was immunoprecipitated from micrococcal-nuclease-resistant chromatin isolated in different cell cycle phases and the immunoprecipitates were examined by immunoblotting for CAF1 complex subunits and MCM2. The experiment was repeated independently twice with similar results. Unprocessed film scans of immunoblots can be found in Supplementary Fig. 6. (b) Experimental design to test the role of CENP-C and CCAN complex in retention of centromeric CENP-A during DNA replication. The experiment was repeated independently three times with similar results. (c) Fluorescence-activated cell sorting (FACS) analysis of DNA content showing the synchronization efficiency of CENP-CAE/AE DLD1 cells during the experiment shown in b. The experiment was repeated independently three times with similar results. (d) Live-cell imaging of CENP-CAE/AE DLD1 cells following addition of auxin (IAA) at 0 min. DNA is labelled with SiR-DNA. The experiment was repeated independently three times with similar results. NT, not treated. (e) Quantitative real-time PCR for α-satellite DNA in chromosomes 1, 3, 5, 10, 12 and 16. N = 3 from three biologically independent experiments. Error bars, s.e.m. P value determined using two-tailed t-test. (f,g) Quantitative real-time PCR for single-mapping α-satellite DNA variant in chromosome 8 (f) and chromosome 15 (g). N = 3 from three biologically independent experiments. Error bars, s.e.m. P value determined using two-tailed t-test. Source data for eg can be found in Supplementary Table 4. (h) Model for maintaining centromeric CENP-A while removing it from non-centromeric sites on the chromosome arms during DNA replication to ensure maintenance of centromere identity across the cell cycle.

CENP-C is essential for the maintenance of centromeric CENP-A during DNA replication

The stable association only in late S phase (when all centromeric, but only a small minority of ectopically loaded CENP-A, is replicated) of CENP-A with MCM2 and the CAF1 subunits necessary for chromatin reassembly suggested that CENP-C and its CCAN complex tethered centromeric CENP-A near the replication forks and stabilized CENP-A binding to MCM2 and CAF1, thereby enabling CENP-A reassembly onto the daughter centromeres after DNA replication. We tested this possibility by rapidly depleting CENP-C just after S-phase entry in a human cell line (CENP-CAE/AE) in which both CENP-C alleles were genome-engineered to produce CENP-C fused to both an auxin-inducible degron and EYFP55. Thymidine was used to synchronize these CENP-CAE/AE cells at the G1/S boundary (Fig. 7b,c). CENP-C degradation was induced just after S-phase entry to test CENP-C’s role specifically during centromeric DNA replication in late S phase (Fig. 7b), but without affecting the deposition of new CENP-A that occurs in early G1.

Auxin addition 2 h after release from thymidine block resulted in polyubiquitination and degradation of almost all CENP-CAE within 15 min, as was evident by the loss of fluorescence of the EYFP in CENP-CAE (Fig. 7b–d and Supplementary Video 1). DNA replication was then allowed to continue without CENP-C and the CCAN complex it nucleates53,55. At the end of DNA replication, chromatin-bound CENP-A was immunoprecipitated and the enrichment of α-satellite-containing DNA was determined. In randomly cycling cells this resulted in a 30-fold enrichment of alphoid DNA (Fig. 7e). At the end of DNA replication and distribution of CENP-A to the two daughter centromeres in CENP-C-containing cells, alphoid DNA enrichment was reduced by half, as expected from doubling of centromeric DNA without addition of new CENP-A. However, degradation of CENP-C early in DNA replication led to loss by the end of S phase of most (73%) of CENP-A initially bound to α-satellite DNA (Fig. 7e).

CENP-C-dependent retention of centromeric CENP-A late in S phase was confirmed by examination of two specific α-satellite variants found within the HORs of the centromeres of chromosomes 8 (Fig. 7f) and 15 (Fig. 7g). Each of these satellite variants is represented only once in the human genome and each shows precise retention in G2 of CENP-A bound in G1 (Fig. 2). In both variants, α-satellite DNA was enriched 50–60-fold following CENP-A ChIP from randomly cycling cells, which was reduced to half as much after DNA replication. Following CENP-C degradation in early S phase, however, CENP-A was not retained at either site during DNA replication (Fig. 7f,g). Taken together, these results demonstrate that depletion of CENP-C (and CCAN bound to CENP-A53,55) before centromere DNA replication results in loss of centromeric CENP-A by the end of S phase.

Discussion

Using reference models for 23 human centromeres, we have identified that during DNA replication CENP-A nucleosomes initially assembled onto centromeric α-satellite repeats are reassembled onto the same spectrum of α-satellite repeat sequences of each daughter centromere as are bound before DNA replication. Additionally, genome-wide mapping of sites of CENP-A assembly has identified that when CENP-A is expressed at endogenous levels the selectivity of the histone chaperone HJURP’s loading in early G1 of new CENP-A at or near existing sites of centromeric CENP-A-containing chromatin is insufficient to prevent its loading onto more than 11,000 sites along the chromosome arms (Fig. 3d). We also show that the number of ectopic sites increases as CENP-A expression levels increase, as has been reported in multiple human cancers39,61,62. Sites of ectopic CENP-A are replicated in early and mid-S (Fig. 6c) and are nearly quantitatively removed as DNA replication progresses (Fig. 6g–l).

Taken together, our evidence demonstrates that DNA replication functions not only to duplicate centromeric DNA but also as an error correction mechanism to maintain epigenetically defined centromere position and identity by coupling centromeric CENP-A retention with its removal from assembly sites on the chromosome arms (Fig. 7h). Indeed, our data reveal that CENP-A loaded onto unique, single-copy sites within α-satellite DNAs of the 23 reference centromeres is precisely maintained at these sites during and after DNA replication, offering direct support that (at least for each of these single-copy sites) the replication machine reloads CENP-A onto the exact same centromeric DNA site (Figs. 2 and 6f).

DNA replication produces a very different situation for CENP-A initially assembled into nucleosomes on the chromosome arms. Sites of ectopically loaded CENP-A are nearly quantitatively stripped during DNA replication (Figs. 3, 4 and 6g–l), thereby precluding premitotic acquisition of CENP-A-dependent centromere function at non-centromeric sites and reinforcing centromere position and identity (Fig. 7h). Without such correction, ectopically loaded sites would be maintained cell cycle after cell cycle, potentially recruiting CENP-C and assembly of the CCAN complex13,14,15,16. Arm-associated CENP-A/CCAN would present a major problem for faithful assembly and function of a single centromere/kinetochore per chromosome, both by acquisition of partial centromere function and by competition with the authentic centromeres for the pool of available CCAN components. Indeed, high levels of CENP-A expression (1) lead to recruitment of detectable levels of 3 of 16 CCAN components (CENP-C, CENP-N and Mis18) assembled onto the arms23,24,63, (2) are associated with ongoing chromosome segregation errors25 and (3) have been reported in several cancers, where it has been proposed to be associated with increased invasiveness and poor prognosis26,61,62.

As to the mechanism for retention during DNA replication of centromeric but not ectopically loaded CENP-A, our mass spectrometry analysis identifies a strong association of HJURP with CENP-A mononucleosomes in late S phase, comparable to the association identified in G1 (Supplementary Fig. 5l), supporting a probable role for HJURP in CENP-A retention, perhaps through interaction with MCM2-7 complex, as has previously been suggested21. This is consistent with evidence that HJURP can associate with MCM2 in a histone-independent manner60, consistent with a possible co-chaperone relationship for CENP-A. Moreover, degradation of HJURP in early S reduces centromeric CENP-A retention through S phase64.

Most importantly, our evidence demonstrates that the local reassembly of CENP-A within centromeric domains requires the continuing centromeric CENP-A association with CCAN complexes (Fig. 7), which act to tether disassembled CENP-A/H4 near the sites of centromere DNA replication. This local CENP-C/CCAN-dependent retention of CENP-A, coupled with the actions of the MCM2 replicative helicase, HJURP and CAF1, enables CENP-A’s precise reassembly into chromatin within each daughter centromere, thereby maintaining epigenetically defined centromere identity.

Methods

Cell lines

Adherent HeLa cells stably expressing CENP-ATAP or H3.1TAP by retrovirus infection13 or endogenously tagged CENP-A+/LAP by infection of a rAAV harbouring a LAP-targeting construct containing homology arms for CENP-A27 were adapted to suspension growth by selecting surviving cells and were maintained in DMEM (Gibco) containing 10% fetal bovine serum (Omega Scientific), 100 U ml−1 penicillin, 100 U ml−1 streptomycin and 2 mM l-glutamine at 37 °C in a 5% CO2 atmosphere with 21% oxygen. DLD1 cells with auxin-degradable CENP-AAID and doxycycline-inducible CENP-AWT37 and DLD1 CENP-CAE/AE 55 were maintained in the same conditions. Cells were maintained and split every 4–5 d according to ATCC recommendations.

Cell synchronization

Cells were synchronized as previously described8. Briefly, suspension HeLa cells were treated with 2 mM thymidine in complete medium for 19 h, pelleted and washed twice in PBS, and released in complete medium containing 24 μM deoxycytidine for 9 h followed by addition of thymidine to a final concentration of 2 mM for 16 h, after which cells were released again into complete medium containing 24 μM deoxycytidine. For G2, cells were collected 7 h after release from the second thymidine block. For G1, thymidine was added for a third time, 7 h after the release, and cells were collected 11 h after this (a total of 18 h after the release from the second thymidine block). Adherent DLD1 cells with auxin-degradable CENP-AAID and doxycycline-inducible CENP-AWT (ref. 37) were synchronized at G1 using 1.5 μM CDK4/6 inhibitor PD-0332991 (also known as Palbociclib) for 30 h or at mitosis using a single 2 mM thymidine block for 20 h and release into 0.1 mg ml−1 nocodazole. Adherent DLD1 CENP-CAE/AE (ref. 55) were synchronized at G1/S with a single thymidine block, followed by washing twice in PBS, and releasing in complete medium containing 24 μM deoxycytidine. Two hours after release, CENP-C rapid degradation was induced by addition of IAA at a final concentration of 500 μM. Nocodazole (Sigma) was added at 0.1 mg ml−1 to prevent cells from going into the next cell cycle. Cells were collected 8 h after release.

Chromatin extraction and affinity purification

Chromatin was extracted from 1 × 109 nuclei of HeLa cells as previously described8. TAP- or LAP-tagged chromatin were purified in two steps. In the first step, native TAP-tagged chromatin was immunoprecipitated by incubating the bulk soluble mononucleosome pool with rabbit IgG (Sigma-Aldrich) coupled to Dynabeads M-270 epoxy (Thermo Fisher Scientific, 14301). Alternatively, CENP-ALAP chromatin was immunoprecipitated using mouse anti-GFP antibody (clones 19C8 and 19F7, Monoclonal Antibody Core Facility at Memorial Sloan-Kettering Cancer Center, New York, USA) coupled to Dynabeads M-270 epoxy. Chromatin extracts were incubated with antibody-bound beads for 16 h at 4 °C. Bound complexes were washed once in buffer A (20 mM HEPES at pH 7.7, 20 mM KCl, 0.4 mM EDTA and 0.4 mM DTT), once in buffer A with 300 mM KCl and finally twice in buffer A with 300 mM KCl, 1 mM DTT and 0.1% Tween 20. In the second step, TAP–chromatin complexes were incubated for 16 h in final wash buffer with 50 μl recombinant tobacco etch virus protease, resulting in cleavage of the TAP tag and elution of the chromatin complexes from the beads. Alternatively, CENP-ALAP chromatin was eluted from the beads by cleaving the LAP tag using PreScission protease (4 h, 4 °C). CENP-A-containing chromatin was immunoprecipitated from DLD1 cells with auxin-degradable CENP-AAID and doxycycline-inducible CENP-AWT (ref. 37) using Abcam ab13939 CENP-A antibody coupled to Dynabeads M-270 epoxy.

DNA extraction

Following elution of the chromatin from the beads, Proteinase K (100 μg ml−1) was added and samples were incubated for 2 h at 55 °C. DNA was purified from proteinase K-treated samples using a DNA purification kit following the manufacturer instructions (Promega, Madison, WI, USA) and was subsequently analysed either by running a 2% low-melting agarose (APEX) gel or with an Agilent 2100 Bioanalyzer using the DNA 1000 kit. The Bioanalyzer determines the quantity of DNA on the basis of fluorescence intensity.

Quantitative real-time PCR

Quantitative real-time PCR was performed using SYBR Green mix (Bio Rad) with a CFX384 Bio Rad Real Time System. Primer sequences used in this study were the following: MRGPRE (forward) 5′-CTGCGCGGATCTCATCTTCC-3′ and (reverse) 5′-GGCCCACGATGTAGCAGAA-3′; MMP15 (forward) 5′-GTGCTCGACGAAGAGACCAAG-3′ and (reverse) 5′-TTTCACTCGTACCCCGAACTG-3′; HBE1 (forward) 5′-ATGGTGCATTTTACTGCTGAGG-3′ and (reverse) 5′-GGGAGACGACAGGTTTCCAAA-3′; Sat2 (forward) 5′-TCGCATAGAATCGAATGGAA-3′ and (reverse) 5′-GCATTCGAGTCCGTGGA-3′; α-satellite DNA (from chromosomes 1, 3, 5, 10, 12 and 16) (forward) 5′-CTAGACAGAAGAATTCTCAG-3′ and (reverse) 5′-CTGAAATCTCCACTTGC-3′ (ref. 41); α-satellite DNA variant from chromosome 8 (forward) 5′-TGAATGCGAGAGAGAAGTAA-3′ and (reverse) 5′-TCAAATATATCCAAATATCCA-3′; α-satellite DNA variant from chromosome 15 (forward) 5′-GTTGCACATTCCGGTTCATACA-3′ and (reverse) 5′- TTTCACCGTAGGCCTCAAAGGGCTCCAACT-3′; ectopic site 1 in chromosome 5 (forward) 5′- CCCTCC TGCCTGAAGATTTGAT-3′ and (reverse) 5′-AAAGCTTGGTGAGGGCAGTT-3′; ectopic site 2 in chromosome 14 (forward) 5′-GCTGTGTACTCCCGAACTCC-3′ and (reverse) 5′- GATCCTGTCCAG CTGCCAG; ectopic site 3 in chromosome 1 (forward) 5′- TCAGTTTGCACCATCCCCTG-3′ and (reverse) 5′-GCTCTGACTCATGCTCCTACTG-3′; ectopic site 4 in chromosome 9 (forward) 5′- AGTGCCCTCTGA ACGCTAAC-3′ and (reverse) 5′- ATTCCTCCCTGAGCTCCCAT-3′. Melting curve analysis was used to confirm primer specificity. To ensure linearity of the standard curve, reaction efficiencies over the appropriate dynamic range were calculated. Using the dCt method, we calculated fold enrichment of α-satellite DNA after immunopurification of CENP-ATAP chromatin, compared with its level in the bulk input chromatin. For the Repli-seq experiment, we used the dCt method to calculate fold enrichment of replicated DNA after immunopurification of BrdU-labelled DNA compared with its level in the bulk input DNA. Reported values are the means of two independent biological replicates with technical duplicates that were averaged for each experiment. Error bars represent s.e.m. To determine CENP-A levels at ectopic sites at G1 and mitosis, we used the dCt method, to calculate fold enrichment of CENP-A-containing DNA compared with its level in the bulk input DNA. CENP-A levels at mitosis were normalized to levels at G1. Reported values are the means of two (G1) or three (mitosis) independent biological replicates with technical duplicates that were averaged for each experiment. Error bars represent s.e.m.

Immunoblotting

For immunoblot analysis, protein samples were separated by SDS–PAGE, transferred onto polyvinylidene difluoride membranes (Millipore) and then probed with the following antibodies: rabbit anti-CENP-A (Cell Signaling, 2186 s, 1:1,000), rabbit anti-CENP-B (Millipore, 07-735, 1:200), mouse anti-α-tubulin (Abcam, DM1A, 1:5,000), rabbit anti-CAF1 p150 (Santa Cruz, sc-10772, 1:500), rabbit anti-CAF1 p60 (Bethyl Laboratories, A301-085A, 1:1,000), rabbit anti-CAF1 p48 (Bethyl Laboratories, A301-206A, 1:1,000) and rabbit anti-MCM2 (Abcam, Ab4461, 1:1,000). Following incubation with horseradish peroxidase-labelled antibody (GE Healthcare, NA931V or NA934V), horseradish peroxidase was detected using enhanced chemiluminescence substrate (Thermo Scientific, 34080 or 34096).

Immunofluorescence and live-cell imaging

1 × 106 suspension cells were centrifuged and resuspended with PBS. 105 cells were immobilized on glass slide by cytospin centrifugation for 3 min at 800 rpm. Cells were then fixed using ice-cold methanol at −20 °C for 10 min, followed by washing with cold PBS, and then incubated in Triton Block (0.2 M glycine, 2.5% fetal bovine serum, 0.1% Triton X-100, PBS) for 1 h. The following primary antibodies were used: mouse anti-GFP (Roche, 11814460001, 1:500), rabbit anti-CENP-B (Abcam 25734, 1:1,000) and human anti-centromere antibodies (ACA, Antibodies Inc, 15-234-0001, 1:500). The following secondary antibodies (Jackson Laboratories) were used for 45 min: donkey anti-human TR (1:300) and anti-mouse fluorescein isothiocyanate (FITC) (1:250). TAP fusion proteins were visualized by incubation with FITC-rabbit IgG (Jackson Laboratories, 1:200). Cells were then washed with 0.1% Triton X-100 in PBS, counterstained with 4,6-diamidino-2-phenylindole and mounted with mounting medium (Molecular Probes, P36934). Immunofluorescent images were acquired on a Deltavision Core system at ×60–100 magnification. 0.2 μm Z-stack deconvolved projections were generated using the softWoRx program. For live-cell imaging, cells were plated on high-optical-quality plastic slides (ibidi) and imaged using a CQ1 confocal quantitative image cytometer (Yokogawa) following addition of IAA. DNA was labelled with Sir-DNA (Cytoskeleton). IAA was added at the microscope stage.

Flow cytometry

Flow cytometry was used to determine the DNA content of the cells. 1 × 106 cells were collected, washed in PBS and fixed in 70% ethanol. Cells were then washed and DNA was stained by incubating cells for 30 min with 1% fetal bovine serum, 10 μg ml−1 propidium iodide and 0.25 mg ml−1 RNase A in PBS followed by FACS analysis for DNA content using a BD LSR II flow cytometer (BD Biosciences).

ChIP-seq library generation and sequencing

ChIP libraries were prepared following Illumina protocols with minor modifications. To reduce biases induced by PCR amplification of a repetitive region, libraries were prepared from 80–100 ng of input or ChIP DNA. The DNA was end-repaired and A-tailed and Illumina TruSeq adaptors were ligated. Libraries were run on a 2% agarose gel. Since the chromatin was digested to mononucleosomes, following adaptor ligation the library size was 250–280 bp. The libraries were size selected for 200–375 bp. The libraries were then PCR-amplified using only five or six PCR cycles since the starting DNA amount was high. Resulting libraries were sequenced using 100 bp, paired-end sequencing on a HiSeq 2000 instrument according to the manufacturer’s instructions with some modifications (Illumina). Sequence reads are summarized in Supplementary Table 1.

Initial sequence processing and alignment

Illumina paired-end reads were merged to determine CENP-A- or H3-containing target fragments of varying length using PEAR software65, with standard parameters (P = 0.01, minimum overlap 10 bases, minimum assembly length 50 bp). Merged paired reads were mapped (BWA-MEM, standard parameters66,67) to the hg38 assembly (including alternative assemblies), which contains human α-satellite sequence models in each centromeric region (ref. 31; BioProject PRJNA193213). The highly repeated sequences in the centromere reference models preclude distinguishing between centromeric and pericentromeric sequences, the order of repeats in the models is arbitrarily assigned, and portions of the centromeres of the acrocentric chromosomes 13, 14, 21 and 22, as well as portions of centromeres of chromosomes 1, 5 and 19, contain nearly identical arrays that cannot be distinguished. Alignments to repeats, or multiple regions in the reference genome with the same mapping score, were assigned randomly. That is, across an entire reference model a given read may have equal probability of mapping across most of the repeat copies; however, the final assignment is random and so the alignments are distributed across the array. Mapping was performed under the same protocol for all ChIP and input samples. Reads were determined to contain α-satellite if they overlapped sites (BEDTools: intersect68) in the genome previously annotated as α-satellite (UCSC Table Browser69 was used to obtain a bed file of all sites annotated as ALR/α-satellite). Additionally, merged sequences were defined as containing α-satellite if they contained an exact match to at least two 18-base oligonucleotides specific to a previously published whole-genome sequencing read database of α-satellite, representing 2.6% of sequences from the HuRef genome29,33. Comparisons between the BWA mapping and 18-base oligonucleotide exact matching based strategies were highly concordant. Total α-satellite DNA content in hg38 assembly was estimated by using the UCSC RepeatMasker annotation69. To identify reads that aligned uniquely to low-frequency repeat variants, or base changes in a repeat unit that are only observed once within the hg38 reference model, we used mapping scores (MAPQ = 20, or the probability of correctly mapping a random read was 0.99). A summary of reads obtained is shown in Supplementary Table 1.

ChIP-seq peak calling

Enrichment peaks for ChIP experiments were determined using the SICER algorithm (v1.0.3)35 using relevant input reads as background, with stringent parameters previously optimized for human CENP-A23: threshold for redundancy allowed for chip reads, 1; threshold for redundancy allowed for control reads, 1; window size, 200 bp; fragment size, 150 bp; shift in window length, 150; effective genome size as a fraction of the reference genome of hg38, 0.74; gap size, 400 bp; e-value for identification of candidate islands that exhibit clustering, 1,000; false discovery rate controlling significance, 0.00001. All sequencing samples were normalized to their matched input control, that is, each G1 sample was normalized to the G1 input and the G2 sequencing samples to the G2 input. In parallel, MACS peak calling was performed (macs14)34, and wiggle tracks were created to represent the read depth of each dataset independently. Finally, we performed a final, rigorous evaluation of ectopic CENP-A peaks, or peaks predicted outside centromeric regions, using k-mer enrichment (previously described29). Each ectopic peak was reformatted into 50 bp sliding windows (in both orientations, with a slide of 1 bp). The normalized frequency of each 50-base oligonucleotide candidate window was evaluated in each ChIP-seq dataset relative to a normalized observed frequency in the corresponding background dataset. Scores were determined as the log-transformed normalized value of the ratio between ChIP-seq and background, and those with a score greater than or equal to 2 were included in our study as a high-confidence enrichment set.

Analysis of CENP-A peak overlap with functional annotation

Ectopic CENP-A peak calls, that is those that did not overlap with centromeric α-satellite DNA, for HeLa, HT1080b (ref. 44) and HuRef (ref. 45) cells, were evaluated for enrichment with functional annotation if they were supported between replicate ChIP-seq experiments and overlapped at least one enriched 50-base oligonucleotide with a log-transformed normalized ratio of ≥2, or with a minimum standard ratio of fivefold. Resulting high-confidence ectopic peak calls were intersected (BEDTools: intersect68) with select functional datasets in the genome (UCSC Table Browser69). Peaks within genes were determined as such if 90% (–f 0.9) of the SICER peak intersected with GRCh38 RefSeq genes (including introns and exons). Peaks were determined at promoters if they intersected 1,000 bp upstream and downstream of genes (with minimum overlap of 50% of the SICER peak). To evaluate the role of expression, gene annotation was catalogued further based on intersection with ENCODE HeLa expression data with RefSeq gene annotations (where 22,211 RefSeq genes (40.5% of total) demonstrated at least 10 average reads/gene; and highly expressed RefSeq genes (10,033, or 18.3% of total) are defined as ≥100 average reads/gene; UCSC Table Browser69). To investigate peak overlap with sites of CTCF enrichment, we intersected peaks with two ENCODE replicate datasets (GEO GSM749729 and GSM749739), with minimum overlap of 20 bp. To study the overlap with sites of open chromatin, peaks were intersected with DNase I hypersensitive ENCODE datasets (GEO GSM736564 and GSM736510), with minimum overlap of 100 bases. Enrichment at H3K27me3 sites was determined by intersecting CENP-A binding data with H3K27me3 ChIP-seq (GEO GSM945208), with minimum overlap of 100 bases. Results were evaluated relative to a simulated peak dataset to test if observed peak counts were higher than expected by chance. Simulations were repeated 100 times to provide basic summary statistics: average, s.d., maximum/minimum, relative enrichment value and empirical P-value. To study the features of CENP-A non-centromeric preferential sites, CENP-A peaks were intersected with Broad Peaks of the following publicly available ENCODE HeLa S3 datasets with minimum overlap of 100 bases: H3K4me1 (GEO GSM798322), H3K4me2 (GEO GSM733734), H3K4me3 (GEO GSM733682), H2A.z (GEO GSM1003483), H3K9me3 (GEO GSM1003480), H3K27ac (GEO GSM733684), H3K27me3 (GEO GSM733696), H3K36me3 (GEO GSM733711), H3K79me2 (GEO GSM733669), H3K9ac (GEO GSM733756) and H4K20me1 (GEO GSM733689). CENP-A SICER peaks were also intersected with ENCODE high-occupancy target regions in the human genome assembly42 (GEO GSE54296).

Repli-seq experiments

BrdU-labelled DNA across S phase was prepared as previously described46 with some modifications. Briefly, cells were synchronized using double thymidine block and release8. Following release from double thymidine block, cells were labelled with BrdU (Sigma, B5002) for 1 h by adding BrdU to the culture medium to a final concentration of 50 µM. BrdU was added immediately after release for labelling at early S (S0–S1), 3 h after release (S3) for mid-S (S3–S4) labelling or 6 h after release (S6) for labelling at late S (S6–S7). Following labelling with BrdU, genomic DNA was extracted, sonicated and heat denatured as previously described46. BrdU-labelled DNA was immunoprecipitated using an anti-BrdU antibody (BD Biosciences, 555627) coupled to magnetic Dyna M-270 epoxy beads (Thermo Fisher Scientific, 14301). Eluted single-stranded DNA was made into double-stranded DNA using random-prime extension (Thermo Fisher Scientific, Random Primers DNA labelling kit, 18187-013). Following cleanup of the double-stranded DNA (QIAgen QiaQuick PCR purification kit, 28104), the DNA was validated by performing quantitative real-time PCR using primers for MRGPRE, MMP15, HBE1 and Sat2 and comparison with the ENCODE HeLa S3 replication timing profile (GEO GSM923449). Libraries were then prepared as described above, and sequenced using the Illumina instrument according to the manufacturer’s instructions, with the exception that, following adaptor ligation, Repli-seq libraries were size selected between 250 and 500 bp.

Mass spectrometry identification of proteins associating with CENP-ATAP chromatin

CENP-ATAP was immunoprecipitated from the chromatin fraction of randomly cycling cells or late-S-synchronized cells as described above. Following bead washes, beads were snap frozen in liquid nitrogen. Samples were prepared for mass spectrometry and run using LTQ Orbitrap Velos mass spectrometer (Thermo Scientific) as previously described70. Analysis was performed as previously described and is detailed in the Reporting Summary.

Statistics and reproducibility

For all experiments shown, n is indicated in the figure legends. Values represent the mean. Error bars, if shown, are s.e.m. (as indicated in the figure legends). For Fig. 6b,e,f,h,j,l, the experiment was repeated independently twice with similar results. For Fig. 6c,d,i,k,m, results of two biologically independent experiments are shown. Statistics source data for all graphical representations are available in Supplementary Table 4.

Reporting Summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.