Background

Across mammalian genomes, the retrotransposon class of repeat elements make up nearly half the nuclear DNA, with long terminal repeats (LTRs) alone accounting for 8-10% [1]. Many retrotransposons retain the ability to move (transpose) to new locations in the genome, a potentially deleterious phenomenon that can result in direct disruption to coding regions [2, 3]. Therefore, the suppression of retrotransposons is vital to organismal survival, particularly through sensitive life stages, and is primarily accomplished via epigenetic mechanisms, including DNA methylation [4, 5]. Epigenetic suppression is typically established early in development at the blastocyst stage and maintained during differentiation of the primordial germ cells [6]. Importantly, in certain cases suppression is incomplete, leading to adverse consequences [7]. Alternatively, this incomplete suppression may be an evolutionarily selected method of fine-tuning gene expression [8]. Genome-wide, little is known about the extent to which interindividual variation of DNA methylation at these elements is affected by environmental factors and/or their phylogenetic lineage [9].

Intracisternal A particles (IAP) are a class of murine retrotransposons named for the appearance of budding doughnut shaped particles in the cisternae of the endoplasmic reticulum [10]. IAPs are members of the endogenous retrovirus family (ERV) class II [11]. Structurally, a full length IAP consists of gag, pro, and pol genes capped at each end by a direct pair of LTRs, each approximately 350 bp in length [12]. However, the largest majority of IAP transcripts issue from truncated 5.4 kb copies of IAP of subtype IΔ1 rather than full length 7.2 kb copies [13]. Previous work has identified only a subset of elements likely to be active, due to genomic position, sequence mutation, or methylation status [13]. The number of paralogous IAP LTRs in the genome has been uncertain with estimates ranging from less than 1000 to over 9000 with significant polymorphism among strains [10, 14, 15]. This family of ERVs is highly active and rapidly transposing, with estimates of 60% of elements being strain specific [15]. It is advantageous for these mobile elements to be kept under tight control by heavy DNA methylation and tight chromatin structure.

A small number of “metastable epialleles” have been identified in mice (e.g. Avy, CabpIAP, and AxinFu) in which variable methylation of the 5′ LTR of an antisense IAP insertion results in dysregulation of gene expression concomitant with the level of methylation [1620]. Metastable epialleles show variable expression among individuals with epigenetic profiles that are consistent across tissues and are thus likely to have been set prior to germ line differentiation [2123]. Further, the distribution of variable expressivity has been shifted at these metastable epialleles following maternal exposure to nutritional and environmental factors [21, 22, 24, 25]. To date, no studies have determined the genetic relationship underlying the ability of these IAP LTRs to be variably methylated. Two of the epialleles, Avy and AxinFu, were identified due to dramatic phenotypes induced by their insertion into nearby protein-coding genes [18, 26]. Conversely, CabpIAP was identified by a search of Genbank cDNA databases for chimeric sequence containing IAP LTR and genic sequence [20].

To systematically investigate the relationship of genetics and metastability we applied phylogenetic methods to identify candidate IAP retrotransposons and validate their variable methylation status. Through this approach, we find the phylogeny of metastable loci sheds light on the evolutionary basis of metastability. Additionally, the validated analysis of epigenetic state across individual isogenic mice now increases the number of known metastable epialleles, which can serve as a test panel for environmental modifications to the epigenome.

Results

IAP distribution in the genome

Initial pairwise alignment of the two most well-studied metastable epialleles, Avy and CabpIAP revealed that the LTRs of these two insertions exhibit 98.5% sequence identity over the length of shared sequence (Figure 1). RepeatMasker was used to scan the C57BL/6 (mm9) mouse genome and identified 10802 IAP LTR elements. The majority of these LTRs were pairs found on the 5’ and 3’ end of the same IAP insertion, however a fraction were present as solo LTRs. Since the IAPs underlying both Avy and CabpIAP are of the class IΔ1, type IAPLTR1_Mm as identified in RepBase, this subtype was used to filter the complete list of transposons, resulting in a total list of 1388 IAPLTR1_Mm elements after removing 65 unmapped, short, or non-alignable elements. There was no bias in the orientation of insertion as 691 sense and 707 anti-sense IAPLTR1_Mm elements were identified throughout the genome.

Figure 1
figure 1

Multiple sequence alignment. The sequence of the Avy and CabpIAP LTR insertions are compared to the IAPLTR1_Mm consensus sequence. Avy shares 98.5% sequence identity with CabpIAP (85% sequence identity over the length of CabpIAP). Colors indicate base changes from consensus (A = red, T = blue, C = yellow, G = green).

Phylogenetic similarity of known metastable IAPs

A neighbor-joining tree of these 1388 elements revealed three distinct clades with Avy and CabpIAP clustering within clade 1 (red), more closely than 99% of all other elements (Figure 2). The largest cluster, clade 3 (black), contains the most conserved elements, and consists of 1130 sequences with an average sequence divergence of only 1.8% from the consensus. Clade 1 (red) and clade 2 (green) contain 147 and 113 elements each, and are more divergent, with 9.1% and 12.9% average differences from the consensus respectively. Bias in location of these insertions was not identified, with members of each clade represented across numerous chromosomes and locations (Figure 3). Unique identifiers were assigned to each element in our list, in the format IAP27-IAP1414, used throughout the study.

Figure 2
figure 2

Neighbor-joining tree. Illustrated are 1388 IAPLTR_Mm elements drawn from the C57BL/6 genome and including Avy and CabpIAP elements for a total of 1390. Subclades are highlighted in red (clade 1) and green (clade 2), with the remaining elements in black (clade 3). The bifurcation of subclades 1 and 2 demonstrates duplication and divergence events followed by rapid radiation of these subfamilies. Scale bar indicates number of nucleotide substitutions per site.

Figure 3
figure 3

IAP element insertions. The 1388 insertions of IAPLTR1_Mm are plotted against their genomic location. The color of the clade corresponding to the insertion’s position on the phylogenetic tree is highlighted to the right of each chromosome. Elements from each clade in the tree are found dispersed throughout the genome. The color scale internal to the chromosomes corresponds to overall repetitive element density.

Characterization of interindividual metastability

From these candidates 7 from each clade were randomly selected to test for variable methylation using combined bisulfite and restriction analysis (CoBRA). Primers for CoBRA were designed with the forward primer internal to the IAP sequence and the reverse primer located in the flanking sequence (Figure 4A), providing an amplicon specific to an insertion and containing homologous IAP element sequence (Table 1). BceAI enzyme was chosen since each element contains two CpG sites as part of a BceAI restriction site, ACGGCG. Loss of methylation at either or both would prevent BceAI activity. Consequently, the relative intensity of cut vs. uncut bands gives a semi-quantitative measurement of methylation at these sites (Figure 4A). Uncut bands correspond to unmethylated DNA while cut bands correspond to methylated DNA. The two CpG sites examined here exhibited interindividual variation in previous studies from our group [24, 27], and are identified as sites 2 and 3 in this study. A quantitated CoBRA gel image of IAP236 is depicted in Figure 4B, revealing an 18% range of methylation at the cut site.

Figure 4
figure 4

CoBRA design. For each candidate locus we amplify a portion of the antisense strand of the 5′ LTR and unique flanking sequence (arrows) containing a BceAI recognition site. (A) The product of IAP236 is shown with cut sites and CpGs indicated by numbers 1–4. Sites 2 and 3 are conserved in all candidate loci and cut. PCR primers are underlined. (B) Percent methylation is derived from relative intensity of the cut vs. uncut bands. This locus exhibits an 18% range in variable methylation. Controls are shown in lanes 1 and 2, indicating the specificity and completeness of the digestion.

Table 1 PCR primers and conditions for CoBRA analysis

A group of 17 isogenic Avy/a genotype mice of both genders (N = 7 males and N = 10 females) from 7 litters was sacrificed at weaning and DNA was extracted from the liver. Following bisulfite conversion, candidate regions were amplified via PCR and digested with BceAI (Figure 5). A total of 21 representative loci, 7 from the red clade including CabpIAP, and 7 each from the green and black clades were evaluated. The 0% control (lane 1) was uncut in all loci, while the high percent methylated control was nearly completely digested in all loci (lane 2) indicating sufficient enzyme activity and specificity.

Figure 5
figure 5

Candidate loci with interindividual variation. DNA from 17 isogenic mouse livers at d21 was isolated and bisulfite converted. Amplification and digestion of 21 representative IAP elements show variable levels of methylation per locus and mouse. Upper bands indicate uncut PCR product corresponding to the unmethylated BceAI site. Lower bands indicate a fully methylated BceAI site consisting of CpGs 2 and 3 from Figure 4. Locus 281Y is located on the Y chromosome and is therefore amplified only from male animals.

The mean methylation across all mice for all IAP insertions was 72.5% and no mouse deviated by more than 3% (range 71-74%), showing no statistical difference (p > 0.99) (Table 2). Specific loci however, exhibited much greater variability. We found that mean methylation of individual IAPs fluctuated from a low of 59% to a high of 88% and standard deviations varied from 1.9 to 7.3, indicating that individual transposons can differ dramatically in both average value and standard deviation among mice. Thus, methylation variability ranged from 9% for the least variable element to greater than 34% for the most variable element. There were no significant differences in methylation between sexes. Importantly, the average methylation was significantly lower for both red clade elements, 70% (p = 0.017), and green clade elements, 68% (p = 0.040) than for black clade elements, 79%. Interestingly, the range through which a particular element can be methylated is also correlated with its position within the phylogenetic tree (Figure 6). The average range for red clade elements is 21%, 17% for green clade elements, and just 11% for black clade elements. The element with the greatest range is in the red clade (CabpIAP) while the element with the greatest dispersion as measured by interquartile range is found in the black clade (IAP506).

Table 2 CoBRA quantitated percent methylation across 17 mice and 21 IAP elements
Figure 6
figure 6

Boxplot of methylation values. Interindividual variation in each IAP correlates with phylogenetic clade. Average methylation values are lower for red and green clades (70% and 68%) than for the black clade (79%) suggesting higher methylation for older insertions. The average range per clade, in parentheses, is also lower, however the standard deviation is similar across clades.

Discussion

Our bioinformatic analysis identified 10802 IAP LTRs, consistent with recent estimates of ~5400 IAP transposons in the mouse genome as the vast majority of detected insertions in our analysis were paired on either end of a single IAP insertion [15, 28]. To validate candidate LTRs as metastable, we focused primarily on sequences of the specific subtype of IAP known in two cases to be variably methylated (IAPLTR1_Mm). Both the Avy and CabpIAP epialleles exhibit variable DNA methylation at specific CpG sites, which can be influenced by developmental environmental exposure [24]. Phylogenetic analysis revealed that both epialleles clustered in one of two divergent subclades in the IAPLTR1_Mm family. Combined bisulfite and restriction analysis performed on 20 additional loci found interindividual methylation to be a common feature of all elements in this family. Though the degree of variation ranged from 9% to 34% among elements within each mouse, average methylation across all LTRs varied only 3%, a confirmation that global methylation of repetitive loci does not necessarily reflect locus specific deviation. Interestingly, on average, IAP methylation was higher and less variable in the more numerous, phylogenetically conserved elements (black clade 3). While the standard method of dating LTR insertions by measuring accumulated mutations in the 5′ and 3′ LTRs is inapplicable in this case because of their young age, the percent divergence from consensus strongly suggests the two derived subclades are younger. The increased methylation of the older IAPs here fits with the evidence of Reiss et al. who found a positive correlation of age with methylation in another family of ERVs, the Early Transposon (ETn) class and hypothesized this would be a general finding in ERVs [29].

Though retrotransposons, including IAPs, are generally highly methylated, important windows of vulnerability exist in which genomic methylation is globally reprogrammed [4, 6, 21]. Reactivation during genomic reprogramming is likely responsible for the frequent germ line insertion in this lineage of IAPs in rodents, counterbalanced by selective pressure to maintain epigenetic silencing in somatic cells [17]. The presence of two diverged clades in our phylogeny suggests that the ancestor of the Avy and CabpIAP alleles at two times collected advantageous mutations that allowed it to escape genomic control and proliferate successfully throughout the genome following the master gene model [30]. Their genomic distribution also suggests that they retained retrotranspositional activity after divergence, a finding also supported previously [15].

Epigenomic studies are increasingly cataloging variation across tissues, disease states, ages, and environmental exposures. It is therefore important to understand the baseline naturally extant variation in a population of isogenic, unexposed, age-matched animals and to provide assays to detect shifts in variation mediated by other conditions. Determination of interindividual variation in DNA methylation has followed a number of methods, including coat color shifts in Avy mice [27], bisulfite sequencing, northern blots for transcript quantity [31], methylation specific PCR, and pyrosequencing [32]. The method we use here, combined bisulfite and restriction analysis (CoBRA), exploits site-specific enzyme digestion to determine methylation status. Advantages of CoBRA include semi-quantitative analysis, and relatively inexpensive costs, thereby facilitating the high throughput necessary for epidemiological testing. Indeed, CoBRA has been used to assess the epigenotype of colorectal tumors, providing direct health implications [33]. Additionally, automation platforms such as the QIAxel system (Qiagen Inc., Valencia, CA) would increase the speed of sample quantitation after enzyme digestion. Our candidate loci were also mined from the C57BL/6 genome, and as such, they are more likely to be useful as epigenetic markers in both mutant and exposed mice than previous epialleles (Avy and AxinFu), which are restricted to specific lines.

Previous studies utilizing non-phylogenetic methods of variably methylated candidate identification have identified retroelements co-located with active histone modification (H3K4me3) [34]. Further, IAPs, unlike other transposons, can induce changes in nearby chromatin marks, but no phylogenetic basis underlying epigenetic states has been explored [2]. The close sequence identity of Avy and CabpIAP insertions may be associated structurally with directed epigenetic modifications by targeting enhanced recruitment of trans factors governing their metastability. Alternatively, high sequence similarity in the divergent clades may have resulted from very recent mobilization and high activity of the parent element and its rapid proliferation. We speculate that variable DNA methylation, being a general property of elements we tested from these clades, explains both the success of this lineage in duplication, and their interindividual variation. This hypothesis is borne out by the average methylation being higher in the black clade, containing the largest number, most conserved, and likely oldest, elements. Though in general, new insertions are considered deleterious, some have been co-opted by the genome as a mechanism of driving gene expression, and epigenetically labile insertions may be under selection to “fine tune” that expression [35]. Our analysis supports the notion that mutation to escape genomic control is correlated with both increased retrotranspositional activity and decreased methylation. Consequently, these data are a strong indication that interindividual variation in methylation at repetitive elements is the default condition, rather than a rare occurrence, and that shifts in methylation profiles could affect numerous transcripts in the genome. Thus, environmental exposures capable of altering the epigenome could potentially have widespread effects for gene regulation.

Conclusions

Combined phylogenetic and DNA methylation analysis resulted in the identification of novel regions of variable methylation. The phylogenetic analysis revealed two duplication and divergence events subdividing this family into three clades. Importantly, average methylation was significantly lower for both the red and green clade elements, compared to black clade elements, supporting the notion that phylogenetically older and more uniform elements tend to be more highly methylated than younger and diverged elements. Thus, through this endeavor the number of known metastable epialleles in the mouse has dramatically increased. These sites can now serve as biomarkers for environmental modifications to the epigenome. Parallel approaches in humans are underway.

Methods

Computational analysis

For the Avy allele, the LTR studied was obtained from Genbank accession number [GenBank:AF540972.1], position 10–345. For CabpIAP, the LTR sequence was obtained from [GenBank:AL732601.16], position 179834–180227. For CoBRA and pyrosequencing analysis, four CpG sites were referenced from the Avy allele coordinates, [GenBank:AF540971.1], positions 365, 378, 380, and 393. These are annotated as positions 1–4 in this report.

Mouse genome version 37.61(mm9, 2007) was downloaded from NCBI and scanned with a local installation of RepeatMasker version open-3.3.0 http://www.repeatmasker.org using the RMblast engine http://www.repeatmasker.org/RMBlast.html with the following parameters, “RepeatMasker -xsmall -s -species mouse”. The repeat database was obtained from RepBase release 20090604 http://www.girinst.org[36]. The output was filtered for elements annotated as IAPLTR and counted to 10802. Since each IAP element contains two LTRs, this represents approximately 5000 full-length IAP elements of all subtypes. We next filtered for elements of type IΔ1, subtype IAPLTR1_Mm and removed non-chromosomal elements and 40 short elements below 330 bp. The resulting dataset contained 707 elements on the sense strand and 691 on the complementary strand (Additional file 1: Table S1). A further 10 elements were removed for lack of alignment to the IAPLTR1_Mm consensus. Alignment was performed by Geneious software version 5.6.5 http://www.geneious.com/ using the Geneious alignment algorithm with default parameters (cost matrix = 65% similarity, gap open penalty = 12, gap extension penalty = 3). The phylogenetic tree was built using neighbor-joining, no outgroup, Jukes-Cantor genetic distance model. Bootstrap of 1000 iterations showed no substantial differences in the divergence of the major clades (data not shown).

Genomic sequence for each element (for alignment and tree building) and 400 bp of flanking sequence (for primer design) was obtained by submission of the RepeatMasker generated coordinates to the Galaxy Project https://main.g2.bx.psu.edu/[37]. Unique numbers were assigned to each IAP here to aid in reference and used throughout the study (Additional file 1: Table S1). A second alignment was performed with flanking sequence to select all sites with 5’ LTR sequence, complete IAP structure, and 3’ LTR sequence. Removal of duplicate entries that matched to the 3’ LTR copy and other LTRs with repetitive flanking sequence resulted in a total of 366 candidate elements for CoBRA [38, 39]. All primers were designed to amplify the antisense strand of the 5’ LTR and proximal flanking sequence (Table 1).

DNA extraction and conversion

Mice were obtained from a forced heterozygous colony carrying the Avy allele that has been maintained with sibling mating for over 220 generations, resulting in a genetically invariant background initially based upon the C3H strain [21]. Virgin wild-type dams, 6 weeks of age, were fed phytoestrogen free AIN-93 G diets (diet 95092 with 7% corn oil substituted for 7% soybean oil; Harlan Teklad, Madison, WI) and were mated. Avy/a offspring were sacrificed at d21 and livers flash frozen for DNA extraction. A total of 17 animals (N = 7 males and N = 10 females) and some sets of littermates were used for this study. DNA extraction was performed using a standard Phenol-Chloroform protocol. Using the Qiagen Epitect kit automated on the Qiagen QIAcube® purification system, approximately 1 μg of genomic DNA was treated with sodium bisulfite. The treatment converts unmethylated cytosines to uracil, read as thymine during polymerase chain reaction (PCR), whereas the methylated cytosines are protected and remain unconverted [40].

Animals used in this study were maintained in accordance with the Guidelines for the Care and Use of Laboratory Animals (Institute of Laboratory Animal Resources, 1996) and were treated humanely and with regard for alleviation of suffering. The study protocol was approved by the University of Michigan Committee on Use and Care of Animals.

Combined bisulfite restriction analysis

CoBRA [39] was performed with enzyme BceAI (New England Biolabs®) using the following conditions for PCR: 50 ng DNA, 0.5 μl each forward and reverse primer at 10pM/μl concentration, 15 μl HotStarTaq master mix (Qiagen Inc., Valencia, CA), 3 μl Rediload™ (Invitrogen Inc., Grand Island, NY), 10 μl water for a total of 30 μl reaction volume. The reaction was run at 52°C annealing temperature for 45 cycles on a C1000 Bio-Rad thermal cycler pcr machine (Hercules, CA). Some reactions were run with altered conditions (Table 1). Enzyme digestion was performed according to manufacturer protocol, briefly, 10 μl of PCR product, 3 μl BSA (10×), 3 μl Rediload™, 3 μl NEB buffer 3 (10×), 1 μl BceAI enzyme (1 Unit), and 10 μl of water were combined for a 30 μl reaction and incubated at 37°C for 8 hours, followed by 65°C for 20 minutes and run out on a 2% agarose gel. Gels were visualized on a Gel Doc XR™ by Bio-Rad. Image quantitation was performed with Quantity One version 4.6.7 using default options (all lanes, auto detect = yes, normalize = yes, sensitivity = 10, size scale = 5) and reporting relative quantity of each band per lane. Lanes with top bands too faint for detection were considered below the limit of detection (LOD) and removed from further analysis (marked as blank in Table 2). Statistical analysis to determine significance of clade methylation differences was performed using SPSS (IBM, New York, NY) by ANOVA with multiple comparisons Tukey adjustment.

Control DNA was generated in-house by whole genome amplification using the GenomePlex® kit (Sigma-Aldrich, St. Louis, MO) for 0% methylated DNA and by CpG methylase, M.SssI treatment (Zymo Research, Irvine CA) for highly methylated control, both according to manufacturer protocols.