Ribosomal DNA (rDNA) displays substantial inter-individual genetic variation in human and mouse. A systematic analysis of how this variation impacts epigenetic states and expression of the rDNA has thus far not been performed.
Using a combination of long- and short-read sequencing, we establish that 45S rDNA units in the C57BL/6J mouse strain exist as distinct genetic haplotypes that influence the epigenetic state and transcriptional output of any given unit. DNA methylation dynamics at these haplotypes are dichotomous and life-stage specific: at one haplotype, the DNA methylation state is sensitive to the in utero environment, but refractory to post-weaning influences, whereas other haplotypes entropically gain DNA methylation during aging only. On the other hand, individual rDNA units in human show limited evidence of genetic haplotypes, and hence little discernible correlation between genetic and epigenetic states. However, in both species, adjacent units show similar epigenetic profiles, and the overall epigenetic state at rDNA is strongly positively correlated with the total rDNA copy number. Analysis of different mouse inbred strains reveals that in some strains, such as 129S1/SvImJ, the rDNA copy number is only approximately 150 copies per diploid genome and DNA methylation levels are < 5%.
Our work demonstrates that rDNA-associated genetic variation has a considerable influence on rDNA epigenetic state and consequently rRNA expression outcomes. In the future, it will be important to consider the impact of inter-individual rDNA (epi)genetic variation on mammalian phenotypes and diseases.
The ribosome is one of the fundamental macromolecular complexes in all living cells, enabling translation to occur in the cytoplasm. The mature mammalian 80S ribosome consists of small (40S) and large (60S) subunits, both comprised of a different complement of proteins and RNA. Despite the essential and highly conserved role played by the ribosome, it is now clear that it can be compositionally diverse even within a given individual organism . It has been shown that the protein composition of the ribosome can vary developmentally  and that the ribosomal RNA (rRNA) components display inter- and intra-individual genetic variation [2, 3]. Such variation is thought to ultimately influence preferential translation of some mRNAs, i.e., the “ribosome filter hypothesis” .
With respect to rRNA variation in mammals, several studies have reported genetic variation within the 45S rDNA in human and mouse (Fig. 1A). The 45S rDNA codes for the 18S rRNA that is incorporated into the 40S subunit, and the 5.8S and 28S rRNAs that are incorporated into the 60S subunit (which additionally contains the 5S rRNA that is coded by the 5S rDNA present at an unlinked genomic location). Because of the multi-copy and multi-allelic nature of 45S rDNA, functional genomic analyses of this source of mammalian genetic variation have been difficult. Indeed, rDNA clusters are typically excluded from genome assemblies, and none of the large-scale genomic analyses of recent years has yielded any insights into rDNA genetic variation. Nonetheless, several smaller studies have noted inter-individual variation at the single-nucleotide level and copy number in mouse and human rDNA, and also epigenetic variation in the context of whole organism stress responses [2, 5,6,7,8]. Furthermore, our previous work showed that mammalian rDNA sequence variation can influence epigenetic states, thereby impacting transcriptional outputs in different biological contexts. However, a systematic analysis of how human and/or mouse rDNA-associated genetic variation influences associated epigenetic states, and rDNA transcriptional output, has thus far not been performed. We therefore had three main aims in this study, specifically to combine long- and short-read sequencing to establish: (i) how single-nucleotide genetic variation at the individual rDNA unit level influences epigenetic states and transcriptional outputs, (ii) whether rDNA copy number influences epigenetic states, (iii) if such genetic variation is relevant in examples of mammalian phenotypes.
Mouse rDNA exists as distinct genetic haplotypes
Using the C57BL/6J mouse strain, we focused on a ~15-kb region of the rDNA termed the “coding unit” (Fig. 1A; the intergenic sequence, IGS, contains high repeat sequence density and was not analyzed in detail). Short-read whole-genome sequencing (WGS) analysis of four different C57BL/6J mice identified 88 different coding unit intra- and inter-individual single-nucleotide variants (SNVs) (Additional file 1; Additional file 2: table S1; inclusion of indels did not affect SNV allele frequencies and were not explicitly considered further). To achieve a deeper understanding of rDNA genomic architecture, we sequenced a C57BL/6J mouse embryonic fibroblast (MEF) line using ultra-long-read whole-genome Nanopore technology , obtaining 932,683 reads (N50 ~ 72 kb), of which 1760 contained one or more rDNA coding units within a single read (Additional file 1; Additional file 2: table S2). We confirmed 87/88 short-read rDNA coding unit SNVs in these reads (Additional file 2: table S1). Previously, we reported a SNV at position -104 (relative to TSS; Fig. 1A) in the C57BL/6J rDNA that is associated with differential promoter methylation: “A” variants at -104 are associated with 30–80% methylation, whereas “C” variants display < 25% methylation at the promoter [6, 7]. Using -104 as a starting point to explore the possibility of larger haplotypes within the rDNA, analysis of the SNVs throughout the coding unit revealed 4 different rDNA haplotypes that we term “ATA,” “ATG,” “CCA,” and “CTA,” in approximately equal proportions (Fig. 1B). ATA and ATG are more genetically similar to each other compared to either CCA or CTA (Fig. 1B; Additional file 1). Independent support for these haplotypes was obtained by pairwise correlation analysis of the relevant haplotype-associated SNVs in the four different short-read kidney WGS datasets (Additional file 3: Fig. S1; Additional file 2: table S3). Analysis of reads containing two or more complete rDNA units revealed that adjacent units tend to correspond to the same rDNA haplotype (Fig. 1C). Although further refinement of C57BL/6J rDNA haplotypes may be possible, the analyses below demonstrate that the rDNA haplotypes underlie bona fide molecular differences.
The impact of rDNA haplotypes on functional genomic outcomes
Nanopore sequencing permits direct assessment of DNA methylation in unamplified DNA. Strikingly, we found that the ATA haplotype displays significant DNA methylation (≳ 60%) across the length of the coding unit, CCA shows low methylation levels (≲ 20%), and the other two haplotypes are largely unmethylated (Fig. 2A). Analysis of individual reads revealed that individual coding units are either almost completely methylated or unmethylated (Additional file 3: Fig. S2), and haplotype-specific methylation differences do not extend into the adjoining IGS regions, which are generally hypermethylated (Additional file 3: Fig. S2). To provide further support for rDNA haplotype-specific methylation using an orthologous method, we analyzed 4 different kidney whole-genome bisulfite sequencing (WGBS) and 7 different sperm reduced representation bisulfite sequencing (RRBS) C57BL/6J datasets (11 different mice) (Additional file 2: table S3). Short-read methods cannot directly define long-range (epi)genetic patterns, and in our case, the major ATA-defining SNV is an “A” at position 6832. However, if the proposed (epi)genomic architecture of the haplotypes is correct, then if we combine the ATA and ATG haplotypes into a single “A” haplogroup that can be distinguished from the CCA and CTA haplotypes at multiple positions, then in short-read data the allele frequency and DNA methylation state at these positions should be predictable based on the combined individual allele frequencies and DNA methylation levels of ATA and ATG at position 6832. This is indeed what we observe (Fig. 2B; Additional file 3: Fig. S3). The WGBS and RRBS datasets also demonstrate that while relative allele frequencies and DNA methylation show inter-individual variation—in particular, the CCA haplotype is occasionally associated with methylation levels up to ~ 20%—it is only the ATA rDNA haplotype that consistently shows substantial methylation in vivo.
DNA methylation at the mouse rDNA promoter has a strong effect on rRNA expression . If methylation is restricted largely to ATA haplotypes, then methylation observed at the rDNA promoter of “A” haplogroup should derive from ATA haplotypes only and correlate with the relative frequency of ATA in the rRNA. We performed “rRNA-seq,” which omits the rRNA depletion step in the standard mRNA-seq protocol (Additional file 1). Also, because of the sheer abundance of cellular rRNA, even internal transcribed spacer (ITS) RNA is readily detected (ITS2 contains the ATA-defining SNV at 6832). We observed a strong negative correlation between “A” haplogroup promoter methylation levels and the relative frequency of A’s at position 6832 in muscle rRNA-seq (Fig. 2C; Additional file 2: table S4). In fact, since DNA methylation levels within any given coding unit are similar along the length of the unit, methylation even in the vicinity (± 200bp centered on the variant) of position 6832 in the rDNA, should be correlated with the variant ratios observed at position 6832 in rRNA-seq, which is what we observe using 4 different kidney WGBS datasets (Fig. 2C). Next, we performed Cleavage Under Targets & Tagmentation (CUT&Tag—a recently developed method to profile histone modifications and chromatin proteins with improved signal to noise ratio ) analysis on kidney samples, finding that only ATA haplotypes display combined H3K9me3 enrichment/H3K27me3 depletion, consistent with previous reports of the relationship between DNA methylation and these histone modifications in other genomic regions  (Fig. 2D; Additional file 3: Fig. S4; Additional file 2: table S5). Finally, it is known that the Upstream Binding Transcriptional factor (UBTF) binding to unmethylated rDNA is required for rRNA expression . Re-analysis of a previously published dataset of UBTF binding in B cells of C57BL/6J mice  showed that ATA is depleted for UBTF relative to the non-ATA haplotypes (Fig. 2E). Collectively, these analyses demonstrate key functional genomic features of ATA vs non-ATA rDNA haplotypes in a variety of tissue types.
DNA methylation dynamics at rDNA are dichotomous and life-stage specific
Epigenetic silencing of rDNA is an integral component of the stress response in all eukaryotic cells. Recent studies using the C57BL/6J strain showed that pre-weaning exposure to nutritional stress, e.g., maternal protein restriction, obesogenic or high-fat diet, induces DNA hypermethylation of rDNA that persists into adulthood [6, 7, 15]. We re-analyzed our previous RRBS data from C57BL/6J individuals exposed to maternal protein restriction prior to weaning  and found that the hypermethylation occurs specifically at the ATA variants (Fig. 3A; Additional file 3: Fig. S5). In the GEO database, RRBS data is available for one other pure C57BL/6J mouse model of nutritional stress in which the offspring were exposed to a high-fat diet during gestation+lactation, or during gestation+lactation+post-weaning . In both experimental groups, relative to controls, rDNA hypermethylation is observed at ATA variants only (Fig. 3A). The early life nutritional stress models demonstrate that certain stimuli induce methylation and silencing of ATA rDNA units. But, can ATA units also lose methylation and re-activate? We re-analyzed Dahlet et al.’s RRBS data of C57BL/6J CRISPR-based Dnmt1 knockout (Dnmt1 KO) 8.5dpc embryos  and found that Dnmt1 KO embryos show virtually no methylation at any of the rDNA haplotypes (Fig. 3B, left panel; Additional file 3: Fig. S6). In standard mRNA-seq protocols, rRNA is specifically removed. However, this depletion is never completely efficient and we find millions of rRNA reads in the Dahlet et al. mRNA-seq data (Additional file 1). Our own analysis of 22 matched C57BL/6J muscle mRNA-seq and rRNA-seq revealed an excellent correlation of rRNA variant frequencies (R ≥ 0.98 in all cases; Additional file 3: Fig. S7). We then examined rRNA expression in Dahlet et al.’s mRNA-Seq data and found that ATA variants are expressed at considerably higher relative levels in the Dnmt1 KOs, proving that methylated ATA variants are not irreversibly silenced (Fig. 3B).
Interestingly, previous studies show that a low-protein diet in C57BL/6J mice post-weaning does not induce rDNA hypermethylation [7, 19]. The study by Cannon et al. also includes a fourth group in which the mice were exposed to a high-fat diet post-weaning only, and in this group, evidence for diet-induced epigenetic differences at rDNA is weaker  (Additional file 3: Fig. S5). To further explore the idea that the epigenetic state of ATA rDNA shows only limited dynamics in adulthood, we leveraged the RRBS dataset of Petkovich et al., , representing 193 different C57BL/6J mice spanning an age range from 3 to 35 months. Aging-associated DNA methylation dynamics are observed genome-wide in a large range of mammalian species [20,21,22], including at rDNA in humans and rodents [23,24,25,26]. In the Petkovich et al. dataset, ATA showed no directional change with age (Fig. 3C; Additional file 3: Fig. S8, S9). However, methylation at non-ATA haplotypes, and in particular CCA and CTA, displays a positive correlation with aging (Fig. 3C). By leveraging single-molecule-level data, we found that this was primarily driven by an increase in DNA methylation entropy within individual DNA molecules at non-ATA haplotypes (Fig. 3D; Additional file 3: Fig. S10, S11) [27,28,29]. Therefore, epigenetic dynamics at rDNA haplotypes are life-stage specific and dichotomous: the ATA haplotype displays environmentally induced epigenetic dynamics during early development, but is less susceptible to further perturbations in later life. On the other hand, non-ATA haplotypes entropically accumulate methylation during aging.
rDNA epialleles are present in other mouse strains and human
We then studied 5 additional inbred mouse strains (Fig. 4A). For each strain, we generated kidney WGS, WGBS, rRNA-seq, and droplet digital PCR (ddPCR) data from each of 6 different adult males (117 different datasets, as 3 were discarded post-QC) (Additional file 2: table S3). We confirmed that for any given sample, > 82% of the rDNA coding unit SNVs called in the WGS data are also found in the WGBS data (Additional file 2: table S6-10; Additional file 3: Fig. S12). We then asked which of these SNVs are associated with allelic methylation differences, i.e., are “epivariants” (Wilcoxon rank sum test, FDR < 0.01), strain-specific, and/or common across multiple strains (Fig. 4A; Additional file 3: Fig. S13; Additional file 2: table S11). Analysis of matched rRNA-seq data showed that epivariant-associated methylation differences throughout the coding unit impact variant frequencies in the rRNA, with the exceptions being 129S1/SvImJ and C3H/HeJ (Fig. 4B; Additional file 3: Fig. S14; p-value = 3.464 × 10−6, paired Wilcoxon rank sum test on the difference of correlation coefficients). We noted that these two strains also showed the lowest levels of rDNA methylation (Fig. 4A). Given the known positive correlation between increasing total rDNA copy number (CN) and epigenetic silencing in lower organisms , we considered the possibility that CN might account for strain-specific differences in total rDNA methylation levels. Therefore, we calculated CN from three independent datasets—WGS, WGBS, and ddPCR—for each individual mouse, as CN measurements in mammals are known to be technically challenging  (see Additional file 1 for CN calculations; Additional file 3: Fig. S15). Indeed, we found a significant positive correlation between total rDNA CN and DNA methylation across the 6 different inbred strains (Fig. 4C; Additional file 3: Fig. S15).
To ask if human rDNA displays similar genetic-epigenetic relationships, we generated WGBS and rRNA-seq data for 48 different human lymphoblastoid cell lines (LCLs) derived from “Gambian in Western Division – Mandinka” adult individuals that were sequenced in the 1000 genomes project  (Additional file 2: table S12). We first confirmed that for any given sample, > 95% of the rDNA coding region SNVs called in the published WGS data were also called in our WGBS data (Additional file 3: table S12). Even in this relatively small number of samples for an outbred population, we identified two epivariants at FDR < 0.01 (Fig. 5A). DNA methylation in the vicinity of these sites (-413 located in the promoter and 7980 located in the 28S) was positively associated with total rDNA CN (Fig. 5B). We then analyzed the 7980 variant in rRNA-seq datasets for these samples and found DNA methylation in the vicinity of 28S negatively associated with variant representation in rRNA (Fig. 5C). Finally, we re-analyzed published ultra-long-read Nanopore data from a GWD sample (Additional file 1). Interestingly, in contrast to the mouse rDNA, we did not find extensive evidence for genetic haplotypic structure within a unit. However, comparison of adjacent units revealed strong epigenetic relatedness like in the mouse and also some evidence that at least the 7980 position shows genetic similarity between neighboring units (Fig. 5D).
Here we have shown that the genetic identity of a rDNA unit has a strong probabilistic influence on the homeostatic epigenetic state. The CN data suggests that the emergence of rDNA epialleles could be linked to the need of the genome to silence extra copies of rDNA. However, the silencing is not absolute, reminiscent of the epigenetic metastability associated with some mammalian retroelements [33, 34]. Could the loss/gain of specific rDNA variants underlie the CN differences? It has been shown that the number of different chromosomes harboring rDNA clusters can vary from 3 to 5 among different inbred mouse strains . The C57BL/6J genome contains rDNA clusters on 4 separate chromosomes , leading to the possibility that each chromosome harbors a different variant given that we find 4 different rDNA haplotypes in the C57BL/6J strain. Interestingly, the C3H strain has only 3 rDNA clusters  and shows very few DNA methylation-based rDNA epialleles.
But could non-rDNA methylation-based epialleles exist in the mouse? We performed CUT&Tag analysis for H3K9me3 and H3K27me3 in the 129S1/SvImJ strain which has very few DNA methylation-based epivariants in rDNA, and even these are of very small effect size. However, we did not find any patterns/clusters of the histone modifications (unlike what is observed for C57BL/6J), suggesting that there are no preferentially epigenetically silenced genetic variants in the absence of DNA methylation, at least in 129S1/SvImJ mice (Additional file 3: Fig. S16). Our 129S1/SvImJ data also suggests that in some mouse strains, DNA methylation plays little, if any, role in silencing of rDNA copies.
For human rDNA, we found limited evidence of genetic haplotypes within a unit, or genetic similarity of adjacent units, consistent with a recent study . It is previously been noted that inter-chromosomal rDNA recombination is greater in human than in the mouse , which would be consistent with our findings. However, there was clear evidence for epiallelic effects within a unit, albeit not as marked as for the mouse, and also epigenetic states of neighboring human rDNA units are similar. This raises the intriguing possibility that the entire cluster of rDNA units on any given chromosome shares a similar epigenetic identity, and this property is conserved among mammalian genomes. In the future, it will be interesting to analyze other mammalian genomes to elucidate what determines higher-order genetic and/or epigenetic patterns of mammalian rDNAs.
With regard to the molecular consequences of rDNA (epi)genetic variation in models of nutritional stress and aging, a tantalizing possibility is the production of variant ribosomes that have subtle but measurable effects on translational output, in line with the ribosome filter hypothesis [1, 4]. It has been shown that endogenous rRNA sequence variation regulates stress response gene expression in bacteria [38, 39]. Previous studies have shown that human and mouse coding subunit rDNA genetic variation is located in rRNA regions with known roles in translation . An initial Polysome-seq analysis shows that the ratio of 28S rRNA variants incorporated into ribosomes is dependent on the ratio at which they are expressed (Additional file 3: Fig. S17). In eukaryotes, there are examples of variation in either ribosomal proteins or rRNA modifications influencing preferential translation of some mRNAs , yet the contribution of rRNA sequence variation has remained largely unexplored and will require large-scale translatome analyses in the future.
Here we have shown that genetic variation at mouse and human ribosomal DNA influences epigenetic states and associated transcriptional outcomes. Currently, this source of genetic variation is largely overlooked in large-scale studies and thus it is possible that rDNA-associated genetic variation underlies some apparently “epigenetic” phenomena  and/or contributes, in trans, to the “missing heritability” in some human phenotypes and diseases.
Mouse embryonic fibroblasts
Mouse embryonic fibroblasts were made from a 13.5 dpc male C57BL/6J embryo following the protocol of the Jacks Lab (available online at http://web.mit.edu/jacks-lab/protocols/Making_MEFs_tables.html). Immortalization was done using the PA317 (λtsA58U19-neo) (G418 0.75 mg/ml) viral vector.
Ultra-long Nanopore sequencing
Ultra-long Nanopore sequencing was performed using the protocol of Jain et al. .
Short-read sequencing libraries
Short-read sequencing libraries were generated using the following kits and according to the manufacturer’s protocols: (i) whole-genome sequencing (WGS)—NEBNext® Ultra™ II DNA Library Prep Kit for Illumina, NEB, USA; (ii) whole-genome bisulfite sequencing (WGBS)—Accel-NGS® Methyl-Seq DNA Library Kit, Swift BioSciences, USA; RNA-seq—NEBNext® Ultra™ II Directional RNA Library Prep Kit for Illumina, NEB, USA. rRNA-seq libraries were made like the RNA-seq but without any depletion. All sequencing was performed on an Illumina NovaSeq6000 by Novogene, Cambridge, UK. For WGS and WGBS, we generated data in the range of ~ 15–25X genome-wide coverage on average, corresponding to ~ 350–1350X at the rDNA. We have previously shown that this level of sequence coverage at rDNA results in data that is extremely well correlated with targeted PCR-based approaches (in which the coverage is not limited) [6, 7]. The methylation values for muscle in Fig. 2C are derived from multiplex bisulfite PCR data generated in . Bisulfite PCR sequencing (Bis-PCR-seq) was performed on DNA from muscle, as in . DNA was bisulfite converted using the EZ-96 DNA Methylation™ Kit (Zymo, Cat. D5003). Targeted amplification was performed using the FastStart High Fidelity PCR System, dNTPack (Sigma-Aldrich, cat. 4738284001) in the 48.48 layout on the Fluidigm C1 system (Fluidigm, USA), a microfluidics platform. Library preparation was performed using the same kit including 4 μl of Access Array BArcode Library Primer and 1 μl of PCR product diluted 1:100. Libraries were sequenced with Illumina NextSeq (75 bp, single-end).
Droplet digital PCR
Droplet digital PCR was used to measure the total rDNA copy number across different strains using mouse kidney tissue. Probes and primers were designed against the genomic mouse 18S rDNA sequence (GenBank: BK000964.3, positions 4008-5877). The 18S targeting probe was attached to the 5′ fluorescent dye, FAM, and a 3′ nonfluorescent quencher, NFQ. The mouse transferrin receptor, Tfrc, was used as a single copy reference and was targeted by a pre-designed assay with a HEX fluorescent dye and an Iowa Black quencher (Bio-Rad, Assay ID: dMmuCNS420644255). The restriction enzyme, AluI (NEB, Cat. RO137S), was used to separate each rDNA gene copy to minimize the number of copies per droplet. AluI was diluted with 1X CutSmart Buffer to give a concentration of 1 U/μl. In total, 50 ng of DNA was used for every AluI digestion. A total of 20 units of AluI for every 1 μg of DNA was used and the reaction was incubated at 37 °C for 1 h. One nanogram of digested DNA was then added to each sample well. The ddPCR reactions were prepared in duplicate in a ddPCR™ 96-well plate (Bio-Rad, Cat. 12001925). The components of the reaction mix per reaction were 11 μl iTaq Universal Probe Supermix (Bio-Rad, Cat. 1725130), 0.275 μl FAM, 1.926 μl HEX, and an 8.8 μl solution of water and DNA. The automated procedure of ddPCR was carried out using the QX200™ AutoDG™ ddPCR system from Bio-Rad. The automated system involved a droplet generator machine (Bio-Rad, Cat. 1864101) DG32™ Automated Droplet Generator Cartridges (Bio-Rad, Cat. 1864108), Automated Droplet Generation Oil for Probes (Bio-Rad, Cat. 1864110), and Pipet Tips for the AutoDG™ System (Bio-Rad, Cat. 1864120). The new plate with droplets was sealed with a pierceable PCR foil at 180 °C for 5 s using PX1 PCR Plate Sealer (Bio-Rad, Cat. 1814000). The PCR reaction was then carried out using the C1000 Touch™ Thermal Cycler (Bio-Rad, Cat. 1851197). PCR settings were as follows: 10 min of 95 °C for initial denaturation, cycle of 40 repetitions, 94 °C for 30 s, annealing temperature 57 °C, and lid temperature 105 °C. After PCR, the plate was processed in a plate reader and the data was generated with Quantasoft Software.
CUT&Tag-seq was performed according to the protocol of Kaya-Okur et al.  with modifications to tissue processing as described below. Experiments were performed in duplicate from two independent mouse kidney tissues in parallel to minimize technical variation. To adapt CUT&Tag for mouse tissue sections, flash-frozen mouse liver tissues (approximately 3–4 mm size) were manually homogenized with tight homogenizers in wash buffer (20 mM HEPES pH 7.5, 150 mM NaCl, 0.1% BSA, 0.5 mM Spermidine and cOmplete EDTA-free protease inhibitor tablet) into a homogenous suspension of intact cells. Cells were transferred to 1.5-ml low DNA binding tubes (Eppendorf), and solutions were exchanged on a magnetic stand (DynaMag-2, Thermo Fisher Scientific). Cells were pelleted by centrifugation for 3 min 600×g at room temperature and resuspended in 500 μl of ice-cold NE1 buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 0.5 mM spermidine, 1% Triton X-100, and 20% glycerol and cOmplete EDTA-free protease inhibitor tablet) and let it sit for 10 min on ice. Nuclei were pelleted by centrifugation for 4 min 1300×g at 4 °C and resuspended in 500 μl of wash buffer and the wash buffer by placing the tubes on a magnet stand to clear and withdraw the liquid, then resuspended in 1.0 ml wash buffer and held on ice until beads are ready. In total, 10 μl of BioMag Plus Concanavalin-A-conjugated magnetic beads (ConA beads, Polysciences, Inc) in binding buffer (20 mM HEPES-KOH pH 7.9, 10 mM KCl, 1 mM CaCl2, and 1 mM MnCl2) was added to each tube containing cells and rotated on an end-to-end rotator for 10 min. After a quick spin to remove liquid from the cap, tubes were placed on a magnet stand to clear and withdraw the liquid, and 800 μl of antibody buffer containing 1 μl of primary antibodies (normal rabbit IgG, Santa Cruz Cat no sc-2027, H3K27me3 (Millipore, Catalogue number 07-449, Lot DAM1703508), and H3K9me3 (Diagenode, Catalogue number C15410193, Lot A.0219P)) was added and incubated at 4 °C overnight in a nutator. Secondary antibodies (guinea pig α-rabbit antibody, Antibodies online cat. no. ABIN101961) were added 1:100 in Dig-wash buffer (5% digitonin in wash buffer) and squirt in 100 μl per sample while gently vortexing to allow the solution to dislodge the beads from the sides and incubated for 60 min on a nutator. Unbound antibodies were washed in 1 ml of Dig-wash buffer for a total of three times. In total, 100 μl of (1:250 diluted) protein-A-Tn5 loaded with adapters (kind gift from Steven Henikoff lab) in Dig-300 buffer (20 mM HEPES pH 7.5, 300 mM NaCl, 0.5 mM spermidine with Roche cOmplete EDTA-free protease inhibitor) was placed on a nutator for 1 h and washed three times in 1 ml of Dig-300 buffer to remove unbound pA-Tn5. Then, 300 μl Tagmentation buffer (Dig-300 buffer + 5 mM MgCl2) was added while gently vortexing and incubated at 37 °C for 1 h on an incubator. Tagmentation was stopped by adding 10 μl 0.5 M EDTA, 3 μl 10% SDS, and 2.5 μl 20 mg/ml Proteinase K to each sample. All were mixed by full speed vortexing for ~ 2 s and incubated for 1 h at 55 °C to digest. DNA was purified by phenol:chloroform extraction using phase lock tubes followed by ethanol precipitation. Libraries were prepared using NEBNext HiFi 2x PCR Master mix (Cat number M0541S) with 72 °C gap filling step followed by 13 cycles of PCR with 10-s combined annealing and extension for enrichment of short DNA fragments.
Preparation of RNAs from polysome fractions
Mouse liver tissue extracts were prepared as previously reported , using polysome extraction buffer (20 mM Hepes-NaOH (pH 7.4), 130 mM NaCl, 10 mM MgCl2, 1% CHAPS, 0.2 mg/ml heparin, 5% glycerol, 2.5 mM DTT, 50 μg/ml cycloheximide, 20 U SUPERase In RNase inhibitor, cOmplete EDTA-free Protease inhibitor). Ribo Mega-SEC run was carried as previously reported , by the flow rate of 0.4 ml/min, and polysome fraction (1.2 ml) was collected from 16.5 to 19.5 min. RNAs in the polysome fraction were extracted by TRIzol LS reagent (SIGMA) and precipitated with isopropanol containing glycogen. Precipitated RNAs were purified further by LiCl precipitation.
Mouse strains used in this study were ordered from Charles River, UK. All mice were 6 weeks of age when delivered and were allowed to acclimatize for 2 weeks, after which they were killed by carbon dioxide. The mice used in Fig. 2C and Additional file 3: Fig. S7 and S17 are from Reference 7. DNA and RNA from tissues were extracted using Qiagen kits and following the manufacturer’s protocols. Aside from the C57BL/6J mouse strain, the other mouse inbred strains were selected to span a range of genetic and copy number variation in the rDNA. Strain-specific copy number estimates from Parks et al. , and WGS sequencing data from the Mouse Genomes Project were obtained for 11 different strains: 129S1/SvImJ, A/J, BALB/cJ, C3H/HeJ, C57BL/10J, C57BL/6N, CAST/EiJ, CBA/J, DBA/2J, FVB/NJ, and MOLF/EiJ . Sequencing reads were processed as indicated below to obtain rDNA SNVs for each strain (see “Short read data processing” in Additional file 1). Allele frequencies at all such SNV positions were then used to cluster strains with R’s Ward hierarchical clustering method (Additional file 3: Fig. S18). Alongside copy number estimates, these clusters served as the main support for selecting 129S1/SvImJ, A/J, C3H/HeJ, C57BL/6N, and CAST/EiJ as additional strains to consider in the current study.
Human lymphoblastoid cell lines (LCLs)
Human lymphoblastoid cell lines (LCLs) of the Gambian in Western Division – Mandinka (GWD) population obtained from the Coriell Institute (New Jersey, USA) were used for all human experiments. Cell lines were seeded at a density of ~ 200,000 cells/ml in RPMI 1640 + GlutaMAX (Gibco; 61870-010) supplemented with 15% fetal bovine serum (Gibco; 10270-106) and a 1% penicillin and streptomycin mix (Gibco; 15140-122). All cell cultures were kept in 37 °C incubators under 5% carbon dioxide conditions. For all downstream experimental analyses, cell lines were pelleted, washed twice through with phosphate-buffered saline (Sigma; P4417), and stored at -80 °C. Whole-genome sequencing data from the 1000 genomes project was obtained for the 119 adult samples from the Gambian Mandinka population (GWD) with cell lines available from the Coriell Institute for Medical Research (all except HG02756). Sequencing reads were processed as indicated below to estimate both rDNA variation and copy number (see “Short-read data processing” and “Total rDNA copy number analysis” in Additional file 1). Twenty-four male and 24 female samples were then selected to maximize the range of rDNA variation and copy number in further analyses (Additional file 2: table S12).
Unless explicitly stated, all analyses were conducted using in-house scripts implemented in R version 4.0.2. An extended description of the data analysis methods in this study can be found in Additional file 1, including detailed explanations of the rDNA reference sequences, command line parameters, and mathematical formulae employed.
Short-read sequencing data were first analyzed using fastqc version 0.11.9 to identify potential issues, and successful libraries were then trimmed for both base quality and adaptor removal using trimgalore version 0.6.5. Alignments to the reference sequences were performed using bowtie2 version 2.4.1 for non-bisulfite-converted DNA data (WGS, ChIP-Seq, and CUT&Tag), bismark for bisulfite-converted data (version 0.7.12 for Bis-PCR-seq and version 0.22.1 for WGBS, RRBS) with underlying bowtie2, and STAR version 2.7.0f for RNA data. Alignment output files were then sorted, indexed, and filtered to retain only reads aligned to the appropriate rDNA unit reference using samtools version 1.10. SNVs were called on all non-bisulfite-converted datasets using lofreq version 2.1.5. For WGBS and RRBS data, CpG methylation estimates were obtained from the bismark alignments and then fed to blink, an in-house python tool for estimating allele-specific methylation and SNV frequencies, described more in detail in Additional file 1 (see “blink: Allele-specific methylation and frequency from bisulfite-converted data”). blink was employed to obtain allele-specific methylation values from both mouse kidney and human LCL data (see “Analysis of mouse strains kidney data” and “Analysis of human LCL data,” respectively, in Additional file 1).
Total rDNA copy number estimates from ddPCR were obtained using the Quantasoft software in default mode. Estimates from short-read sequencing data (WGS and WGBS), on the other hand, were obtained following the procedure Gibbons et al.  suggest (see “Total rDNA copy number analysis” in Additional file 1).
Ultra-long-read Nanopore libraries sequenced from MEFs were initially basecalled using albacore version 2.4.0 and aligned to the 18S and 28S regions from the BK000964.3 mouse rDNA reference using the minimap2 python interface mappy. The raw signal fast5 files for those sequencing reads that map to such regions were later re-basecalled using guppy version 4.2.2 and mapped to a whole-genome plus rDNA mouse reference using minimap2 version 2.17. Genetic variation and CpG methylation were estimated from the rDNA reads using megalodon, whose output files were further processed as described in Additional file 1 (see “MEF ultra-long read Nanopore data processing and analysis”) to obtain putative rDNA haplotypes and their corresponding methylation levels. Haplotype-specific alleles identified from the MEFs were then employed to assign short Illumina reads to haplotypes (see “Haplotype-specific analysis of C57BL/6J kidney and muscle data” and “Processing and analysis of C57BL/6J publicly-available datasets” in Additional file 1).
Publicly available MinION ultra-long-read Nanopore data for human sample HG02723 was obtained from Circulomics (https://www.circulomics.com/datasets). Raw fast5 files were basecalled using guppy version 4.2.2 and aligned to the human whole-genome plus rDNA reference sequence with minimap2. Reads mapping to the rDNA were considered for further analysis with megalodon version 2.2.9, using a reference sequence artificially expanded to include 20 back-to-back rDNA units. Short-read SNVs were obtained by first reconstructing with samtools fastq the raw sequencing reads from the CRAM file available from the 1000 genomes project, and then trimming, aligning, and variant calling as previously described. These SNVs were then used as input in megalodon to obtain per-read variant and CpG methylation calls, which were subsequently processed as described in Additional file 1 (see “Human ultra-long read Nanopore data processing and analysis”) to establish potentially haplotypic structures.
Availability of data and materials
All data generated for this study is available from BioProject ID: PRJNA733656.
Genuth NR, Barna M. The discovery of ribosome heterogeneity and its implications for gene regulation and organismal life. Mol Cell. 2018;71(3):364–74.
Parks MM, Kurylo CM, Dass RA, Bojmar L, Lyden D, Vincent CT, et al. Variant ribosomal RNA alleles are conserved and exhibit tissue-specific expression. Sci Adv. 2018;4(2):eaao0665.
Tseng H, Chou W, Wang J, Zhang X, Zhang S, Schultz RM. Mouse ribosomal RNA genes contain multiple differentially regulated variants. PLoS One. 2008;3(3):e1843.
Mauro VP, Matsuda D. Translation regulation by ribosomes: increased complexity and expanded scope. RNA Biol. 2016;13:748–55.
Gibbons JG, Branco AT, Yu S, Lemos B. Ribosomal DNA copy number is coupled with gene expression variation and mitochondrial abundance in humans. Nat Commun. 2011;11(5):4850.
Holland ML, Lowe R, Caton PW, Gemma C, Carbajosa G, Danson AF, et al. Early-life nutrition modulates the epigenetic state of specific rDNA genetic variants in mice. Science. 2016;353(6298):495–8.
Danson AF, Marzi SJ, Lowe R, Holland ML, Rakyan VK. Early life diet conditions the molecular response to post-weaning protein restriction in the mouse. BMC Biol. 2018;16(1):51.
Shao F, Liu X, Zhang X, Wang Q, Wang W. Methylation of 45S ribosomal DNA (rDNA) is associated with cancer and aging in humans. Int. J. Genomics. 2021;2021:8818007.
Jain M, Koren S, Miga KH, Quick J, Rand AC, Sasani TA, et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat Biotechnol. 2018;36(4):338–45.
Diesch J, Bywater MJ, Sanij E, Cameron DP, Schierding W, Brajanovski N, et al. Changes in long-range rDNA-genomic interactions associate with altered RNA polymerase II gene programs during malignant transformation. Commun Biol. 2019;2:39.
Santoro R, Grummt I. Molecular mechanisms mediating methylation-dependent silencing of ribosomal gene transcription. Mol Cell. 2001;8(3):719–25.
Kaya-Okur HS, Wu SJ, Codomo CA, Pledger ES, Bryson TD, Henikoff JG, et al. CUT&Tag for efficient epigenomic profiling of small samples and single cells. Nat. Commun. 2019;10(1):1930.
Rose NR, Klose RJ. Understanding the relationship between DNA methylation and histone lysine methylation. Biochim Biophys Acta. 2014;1839(12):1362–72.
O'Sullivan AC, Sullivan GJ, McStay B. UBF binding in vivo is not restricted to regulatory sequences within the vertebrate ribosomal DNA repeat. Mol Cell Biol. 2002;22(2):657–68.
Denisenko O, Lucas ES, Sun C, Watkins AJ, Mar D, Bomsztyk K, et al. Regulation of ribosomal RNA expression across the lifespan is fine-tuned by maternal diet before implantation. Biochim Biophys Acta. 2016;1859(7):906–13.
Cannon MV, Buchner DA, Hester J, Miller H, Sehayek E, Nadeau JH, et al. Maternal nutrition induces pervasive gene expression changes but no detectable DNA methylation differences in the liver of adult offspring. PLoS One. 2014;9(3):e90335.
Dahlet T, Argüeso Lleida A, Al Adhami H, Dumas M, Bender A, Ngondo RP, et al. Genome-wide analysis in the mouse embryo reveals the importance of DNA methylation for transcription integrity. Nat Commun. 2020;11(1):3153.
Petkovich DA, Podolskiy DI, Lobanov AV, Lee SG, Miller RA, Gladyshev VN. Using DNA methylation profiling to evaluate biological age and longevity interventions. Cell Metab. 2017;25(4):954–960.e6.
Shea JM, Serra RW, Carone BR, Shulha HP, Kucukural A, Ziller MJ, et al. Genetic and epigenetic variation, but not diet, shape the sperm methylome. Dev Cell. 2015;35(6):750–8.
Horvath S. DNA methylation age of human tissues and cell types. Genome Biol. 2013;14(10):R115.
Hannum G, Guinney J, Zhao L, Zhang L, Hughes G, Sadda S, et al. Genome-wide methylation profiles reveal quantitative views of human aging rates. Mol Cell. 2013;49(2):359–67.
Mammalian Methylation Consortium. www.biorxiv.org/content/10.1101/2021.01.18.426733v1.
Swisshelm K, Disteche CM, Thorvaldsen J, Nelson A, Salk D. Age-related increase in methylation of ribosomal genes and inactivation of chromosome-specific rRNA gene clusters in mouse. Mutat. Res. 1990;237(3-4):131–46.
Oakes CC, Smiraglia DJ, Plass C, Trasler JM, Robaire B. Aging results in hypermethylation of ribosomal DNA in sperm and liver of male rats. Proc. Natl. Acad. Sci. USA. 2003;100(4):1775–80.
D'Aquila P, Montesanto A, Mandalà M, Garasto S, Mari V, Corsonello A, et al. Methylation of the ribosomal RNA gene promoter is associated with aging and age-related decline. Aging Cell. 2017;16(5):966–75.
Wang M, Lemos B. Ribosomal DNA harbors an evolutionarily conserved clock of biological aging. Genome Res. 2019;29(3):325–33.
Landan G, Mendelson Cohen N, Mukamel Z, Bar A, Molchadsky A, Brosh R, et al. Epigenetic polymorphism and the stochastic formation of differentially methylated regions in normal and cancerous tissues. Nat. Genet. 2012;44:1207–14.
Xie H, Wang M, de Andrade A, Bonaldo MDF, Galat V, Arndt K, et al. Genome-wide quantitative assessment of variation in DNA methylation patterns. Nucleic Acids Res. 2011;39:4099–108.
Scherer M, Nebel A, Franke A, Walter J, Lengauer T, Bock C, et al. Quantitative comparison of within-sample heterogeneity scores for DNA methylation data. Nucleic Acids Res. 2020;48(8):e46.
Jack CV, Cruz C, Hull RM, Keller MA, Ralser M, Houseley J. Regulation of ribosomal DNA amplification by the TOR pathway. Proc Natl Acad Sci U S A. 2015;112(31):9674–9.
Hall AN, Turner TN, Queitsch C. Thousands of high-quality sequencing samples fail to show meaningful correlation between 5S and 45S ribosomal DNA arrays in humans. Sci. Rep. 2021;11(1):449.
1000 Genomes Project Consortium, Auton A, Brooks LD, Durbin RM, Garrison EP, Kang HM, et al. A global reference for human genetic variation. Nature. 2015;526(7571):68–74.
Kazachenka A, Bertozzi TM, Sjoberg-Herrera MK, Walker N, Gardner J, Gunning R, et al. Identification, characterization, and heritability of murine metastable epialleles: implications for non-genetic inheritance. Cell. 2018;175(6):1717.
Dominguez-Salas P, Moore SE, Baker MS, Bergen AW, Cox SE, Dyer RA, et al. Maternal nutrition at conception modulates DNA methylation of human metastable epialleles. Nat. Commun. 2014;5:3746.
Dev VG, Tantravahi R, Miller DA, Miller OJ. Nucleolus organizers in Mus Musculus subspecies and in the Rag mouse cell line. Genetics. 1977;86(2):389–98.
Eickbush TH, Eickbush DG. Finely orchestrated movements: evolution of the ribosomal RNA genes. Genetics. 2007;175(2):477–85.
Hori Y, Shimamoto A, Kobayashi T. The human ribosomal DNA array is composed of highly homogenized tandem clusters. Genome Res. 2021;31(11):1971–82.
Kurylo CM, Parks MM, Juette MF, Zinshteyn B, Altman RB, Thibado JK, et al. Endogenous rRNA sequence variation can regulate stress response gene expression and phenotype. Cell Rep. 2018;25(1):236–248.e6.
Song W, Joo M, Yeom JH, Shin E, Lee M, Choi HK, et al. Divergent rRNAs as regulators of gene expression at the ribosome level. Nat. Microbiol. 2019;4(3):515–26.
Bughio F, Maggert KA. The peculiar genetics of the ribosomal DNA blurs the boundaries of transgenerational epigenetic inheritance. Chromosome Res. 2019;27(1-2):19–30.
Yoshikawa H, Larance M, Harney DJ, Sundaramoorthy R, Ly T, et al. Efficient analysis of mammalian polysomes in cells and tissues using ribo mega-SEC. eLife. 2018;7:e36530.
Brenes AJ, Yoshikawa H, Bensaddek D, Mirauta B, Seaton D, et al. Erosion of human X chromosome inactivation causes major remodeling of the iPSC proteome. Cell Rep. 2021;35(4):109032.
Keane TM, Goodstadt L, Danececk P, White MA, Wong K, et al. Mouse genomic variation and its effect on phenotypes and gene regulation. Nature. 2011;477(7364):289–94.
Gibbons JG, Branco AT, Godinho SA, Yu S, Lemos B. Concerted copy number variation balances ribosomal DNA dosage in human and mouse genomes. Proc. Natl. Acad. Sci. USA. 2015;12(8):2485–90.
Agrawal S, Ganley ARD. The conservation landscape of the human ribosomal RNA gene repeats. PLoS One. 2018;13(12):e0207531.
Gigante S, Gouil Q, Lucattini A, Keniry A, Beck T, Tinning M, et al. Using long-read sequencing to detect imprinted DNA methylation. Nucleic Acids Res. 2019;47(8):e46.
We thank Profs Sue Ozanne and Oliver Rando for feedback on the work, Dr Miguel Branco for the H3K9me3 Ab, and Dr Pankaj Dubey for help in CUT&Tag-seq library preparation. We also thank The Bart’s and the London Genome Centre for sequencing and the assistance of the ITS Research team at Queen Mary University of London.
This manuscript was previously reviewed at another journal and no review history is available.
Peer review information
Stephanie McClelland and Anahita Bishop were the primary editors of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Biotechnology and Biological Sciences Research Council grant BB/G00711X/1 (VKR); Biotechnology and Biological Sciences Research Council grant BB/R00675X/1 (FRA, RAES, VKR); The Barts Charity Research grant MGU0390 (VKR); Rosetrees PhD studentship A1903 to SY (VKR); Biotechnology and Biological Sciences Research Council LIDO studentship BB/M009513/1 (ZA); Academy of Medical Sciences award SBF003\1026 (MLH); Edmond and Lily Safra Research Fellowship (SJM); Royal Society Research Grant RGS\R2\180202 (MLH); MRC grant MR/T000783/1 (MMP); The Barts Charity Research grant MGU0475 (MMP); The Wellcome Trust grant 204843/Z/16/Z, (NH, ML); JSPS KAKENHI 21K06138 (HY).
Ethics approval and consent to participate
No ethical approval was required for this study.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Rodriguez-Algarra, F., Seaborne, R.A.E., Danson, A.F. et al. Genetic variation at mouse and human ribosomal DNA influences associated epigenetic states. Genome Biol 23, 54 (2022). https://doi.org/10.1186/s13059-022-02617-x