Introduction

Genomic imprinting is an epigenetic mechanism that results in monoallelic expression of genes depending on parent-of-origin of the allele. Several studies have shown that genomic imprinting is conserved among mammalian species. However, there is accumulating evidence of the lack of conservation of imprinted genes. In a survey of 63 examined protein-coding imprinted genes, 35 in human and 54 in mouse, only 26 genes were reported to be conserved in both species (Morison et al. 2005). Paulsen et al. (2000) investigated the sequence conservation among six genes on distal chromosome 7 in the mouse and their human orthologs on chromosome 11p15. Although the organization of the mouse and human genes was found to be highly conserved, some genes showed nonimprinting patterns in both species.

Most known imprinted genes have been identified in the mouse and human, and few genes have been reported to be imprinted in cattle. Recently, however, we and others have reported the conserved imprinting of several bovine genes, including IGF2R (Killian et al. 2001); NESP55 (Khatib 2004); IGF2, MEG3, XIST (Dindot et al. 2004); PEG3 (Kim et al. 2004); H19, NAP1L5, and NNAT (Zaitoun and Khatib 2006). Conversely, we reported that imprinting of DCN, SDHD, COPG2 (Khatib 2005a), and SLC38A4 (Zaitoun and Khatib 2006) is not conserved among human, mouse, and cattle. Comparative analysis of genomic imprinting across mammalian species would offer a powerful tool for elucidating the mechanisms regulating the unique expression of imprinted genes (Dindot et al. 2004; Killian et al. 2001). Hence, the intent of this study is to compare the imprinting status of cattle genes with that of the human and mouse orthologs.

Recently, the question of whether imprinted genes have sequence characteristics that distinguish them from nonimprinted genes is drawing the attention of several research groups. Such structural differences might elucidate the mechanisms leading to allele-specific expression of imprinted genes (Okamura and Ito 2006). Greally (2002) found that the main sequence characteristic of human imprinted genes is a lower incidence of short interspersed nuclear elements (SINEs). Allen et al. (2003) reported that the densities of CpG islands and SINEs were lower in the flanking regions of monoallelically expressed and imprinted genes when compared to biallelically expressed genes. They also showed that monoallelically expressed and imprinted genes were flanked by regions with a high density of long interspersed nuclear elements (LINEs) compared to biallelically expressed genes (Allen et al. 2003). In a search for sequence features in the IGF2 gene in mammalian species where it is either imprinted or nonimprinted, Weidman et al. (2004) found that the imprinting of IGF2 is strongly associated with a lack of SINEs. Shirohzu et al. (2004) found that a 210-kb DNA segment, located on mouse chromosome 7 between two imprinting loci, was rich in tandem repeats, LINE-1 elements, and retroviral insertions. Additional support for the hypothesis of the lower density of SINEs defining imprinted regions was provided by Walter et al. (2006). They reported that the frequency of SINEs in imprinted genes was 7.32% compared to 13.9% in nonimprinted genes. Also, they reported that in genes with a G + C content of greater than 40%, the frequency of LINEs was higher in imprinted genes than in control genes. An interesting structural element that distinguishes imprinted from nonimprinted genes was reported for the IMPACT gene, which is known to be imprinted in mouse but not in human. The imprinting status of IMPACT was found to be associated with the presence of tandem repeats in the CpG islands of the mouse gene, whereas the nonimprinted gene in human lacks repeats and differential methylation in CpG islands (Okamura and Ito 2006). A recent study that included 38 imprinted human, 39 imprinted mouse, and 79 control genes showed that CpG islands of imprinted genes were enriched in tandem repeats compared to nonimprinted genes (Hutter et al. 2006).

Giving structural elements a role in the regulation and maintenance of genomic imprinting is an attractive hypothesis, and as these elements have not yet been reported in bovine sequences, the objectives of this study were to evaluate the occurrence of G + C content, CpG islands, and retrotransposable elements in bovine genes and to perform comparative analysis between cattle imprinted and nonimprinted genes and among cattle and mouse and human genes. Such inter- and intraspecies comparative analyses would provide insight into the mechanisms and the evolution of genomic imprinting.

Methods

Gene selection

A total of 22 genes with known imprinting status in cattle were selected for comparative analysis; IGF2, IGF2R, MEG3, MEST, PEG3, XIST, and ZIM2 were selected from the literature (Table 1), whereas DCN, NESP55-GNAS, H19, MAGEL2, NAP1L5, NNAT, RTL1, SDHD, SLC38A4, ASB4, CD81, HTR2A, OSBPL5, PEG10, and TSSC4 were selected based on work in our laboratory. Data on the imprinting status of the orthologous human and mouse genes were obtained from the literature and from the Catalogue of Imprinted Genes (http://www.igc.otago.ac.nz/). For comparison with the imprinted genes, 20 genes that had previously been determined to show biallelic expression in various experiments in our laboratory were selected to serve as controls (Table 2).

Table 1 Comparative imprinting analysis of human, mouse, and cattle genes
Table 2 The number of tandem repeats (TR) and CpG islands in coding sequences and in total sequences (coding and 50-kb up- and downstream flanking sequences) of imprinted and control genes

Sequence analysis

Imprinted and control gene sequences were studied in two ways. In the first analysis, a total of 245,823 bp of coding sequence from 14 imprinted cattle genes (Table 2) and a total of 909,448 bp of coding sequence from the 20 control genes were searched and CpG islands, direct tandem repeats, SINEs, LINEs, long terminal repeats (LTRs), and G + C content were quantified. Also, a total of 8,821,311 bp from the bovine X chromosome were searched for LINE elements. In the second analysis, 50 kb of upstream sequence from the transcription initiation site and 50 kb of downstream sequence from the transcription termination site, in addition to the coding regions of the same 34 genes, were analyzed for the aforementioned sequence elements. The exact sizes of the upstream and downstream regions of each gene were determined considering the location of potential regulatory regions. TCCS4 was not included in the second analysis because its flanking sequences have not yet been identified. For this analysis, a total of 2571 genes from bovine chromosome 5 (n = 1335), chromosome 6 (n = 565), and chromosome 14 (n = 671) were selected to serve as a controls for SINE and LINE densities.

CpG islands were identified using the EMBOSS program cpgplot (http://www.bioweb.pasteur.fr/seqanal/interfaces/cpgplot.html) with a window size of 120 bp, window shift increment of 1 bp, minimum length of an island of 200 bp minimum observed-to-expected ratio of CpG dinucleotides of 0.6, and minimum percentage of 50 (C + G content) (Gardiner-Garden and Frommer 1987). C + G content was calculated using the EMBOSS program geecee (http://www.bioweb.pasteur.fr/docs/EMBOSS/geecee.html) or RepeatMasker (http://www.repeatmasker.org). Direct tandem repeats were identified using the Tandem Repeats Finder version 3.21 (http://www.tandem.bu.edu/trf/trf.html). The alignment parameters for match, mismatch, and indels were 2, 5, and 7, respectively. The minimum alignment score to report repeat was 100 and the maximum period size was 2000 bp.

The multiple sequence alignment program ClustalW (http://www.ebi.ac.uk/clustalw) was used to align sequences of direct tandem repeats identified in imprinted genes in human, mouse, and cattle. Repeat masking was done using RepeatMasker open version 3.1.6 (http://www.repeatmasker.org). This program searches DNA sequences in FASTA format for interspersed repeats and returns a masked query sequence ready for database searches. Perl script was used to calculate transposable element frequency for each gene based on RepeatMasker results. To test for statistically significant differences between imprinted and nonimprinted genes regarding their structural sequence elements (C +G content, SINEs, LINEs, and LTRs), we used the Wilcoxon-Mann-Whitney test. Boxplots of the distributions of LINE-1 and SINE frequencies were plotted using the statistics package R (R-project, http://www.cran.r-project.org/).

Results

Conservation of genomic imprinting in mammalian species

Table 1 shows the imprinting status of cattle genes and their human and mouse orthologs. Eleven genes were found to be imprinted in all three species; of those, eight genes were paternally expressed and three were maternally expressed. Of 22 genes known to be imprinted in human or in mouse, 14 were found to be imprinted in cattle. Of the remaining genes, DCN, CD81, SLC38A4, and ASB4—known to be imprinted in mouse—were found to be not imprinted in cattle. For SDHD and HTR2A, conflicting data have been reported regarding their imprinting status in human. The ZIM2 and OSBPL5 genes were reported to be imprinted in human and mouse but found to be not imprinted in cattle.

Sequence characteristics of coding sequences of imprinted and control genes

Genomic sequence characteristics of a total of 245,823 bp from the coding sequences of 14 imprinted cattle genes were compared to those of 909,443 bp of coding sequences from 20 control genes (Table 2). The average number of tandem repeats per kb in imprinted genes was 0.15, whereas in control genes the average was 0.021 tandem repeats per kb. Similarly, the average number of CpG islands per kb was higher in imprinted (0.285) than in control genes (0.023). The frequency of G + C was also higher in imprinted genes (51%) than in control genes (45%) (p = 0.018). For imprinted genes, 11 tandem repeats were found in CpG islands of three genes (IGF2, MAGEL2, and PEG3) compared to one tandem repeat found in one control gene (WARS). Figure 1 shows boxplots of the frequency of SINEs and LINE-1s in imprinted and control genes. The frequency of SINEs was significantly lower in imprinted genes, 5.5%, whereas the frequency of these elements in control genes was 16.2% (p < 0.0001). The genes NESP55, IGF2, MAGEL2, NNAT, RTL1, and PEG10 had no SINEs in their intragenic sequences, whereas IGF2R, MEST, NAP1L5, PEG3, XIST, and TSSC4 had SINEs at frequencies that ranged from 2.1% to 10.7%. In contrast, SINEs were found in all of the 20 control genes examined in this study. The frequencies of SINEs in control genes ranged from 0.7% to 24.6%. Likewise, the frequency of LINEs was significantly lower (p = 0.0003) in imprinted genes (4.7%) than in control genes (13.7%). In contrast, frequency of LINEs in a total of 8,821,311 bp from the bovine X chromosome was 26.8%.

Fig. 1
figure 1

Boxplot graphs (median and interquartile) representing (a) SINE and(b) LINE-1 frequencies identified by the RepeatMasker program in 14 imprinted cattle genes and in the coding sequences of 20 control (nonimprinted) cattle genes. Wilcoxon-Mann-Whitney test revealed a statistically significant difference for the occurrence of SINEs and LINE-1s between imprinted and control genes

Also, the frequency of LTRs was significantly lower (p = 0.0116) in imprinted genes (0.4%) compared to control genes (1.7%). Only two imprinted genes (IGF2R and XIST) had these elements, whereas 11 of 20 control genes had LTRs with frequencies that ranged from 0.3% to 3.8%.

To test whether tandem repeats found in imprinted genes are conserved among mammalian species, we used the Tandem Repeats Finder to identify tandem repeats in the imprinted genes listed in Table 1. A total of 64, 50, and 45 tandem repeats were found in 12 human, 11 mouse, and 13 cattle genes, respectively. Supplementary Tables 1, 2, and 3 show repeat size, copy number, and sequence of tandem repeats found in human, mouse, and cattle genes, respectively. To identify homologous tandem repeats in human, mouse, and cattle, we used the multiple sequence alignment program ClustalW (http://www.ebi.ac.uk/clustalw). Alignment was performed only for genes that were found to be imprinted in at least two species (see Table 1). Table 3 shows the number of tandem repeats found in nine genes and the sequence alignment of these repeats across human, mouse, and cattle species. For GNAS, a total of four alignments were found between human and mouse repeats, in which a 75% sequence similarity (score) was found. It is worth noting that H3 and M1 repeats that showed a score of 75% are located in CpG islands of GNAS. For IGF2R, six alignments were found between human/cow and human/mouse repeats. The highest alignment score (76%) was between human repeat 5 (H5) and cow repeat 1 (C1). The repeats H5, C1, C3, and C11 were found in CpG islands of IGF2R. PEG3 showed 12 sequence alignments of tandem repeats in human, mouse, and cattle with scores that ranged from 53% to 90%: five human/cow alignments, four human/mouse alignments, and three alignments of mouse/cow (Table 3). Highly conserved tandem repeats across species with scores of at least 80% were found for MAGEL2, MEST, PEG3, RTL1, OSBPL5, and PEG10.

Table 3 Sequence alignment of tandem repeats in imprinted genes across human, mouse, and cattle

Sequence characteristics of flanking sequences of imprinted and control genes

For the analysis of flanking sequences of imprinted and control genes we chose a 50-kb window, considering the location of potential regulatory regions. Tandem repeats were found in all imprinted and control genes examined (Table 2). CpG islands were identified in all imprinted genes except for XIST and in 16 of 20 control genes (Table 2). Table 4 shows the average of number of tandem repeats and CpG islands per kb in coding and flanking sequences of imprinted and control genes. Of considerable interest was the high density of tandem repeats and CpG islands observed in coding sequences of imprinted genes (0.151/kb and 0.285/kb, respectively) vs. control genes (0.021/kb and 0.023/kb, respectively). Figure 2 and Supplementary Figure 1 show the distribution of CpG islands in coding and flanking sequences of imprinted genes compared with control genes. A higher density of CpG islands was observed in imprinted gene regions compared with biallelically expressed genes.

Table 4 Number of tandem repeats (TR) and CpG islands per kb genomic sequences in coding and flanking sequences of imprinted and biallelically expressed genes
Fig. 2
figure 2

Distribution of CpG islands in a subset of (a) imprinted and(b) biallelically expressed genes with 50-kb upstream and downstream flanking sequences. Vertical bars indicate position of each CpG island and the thick line on the x axis corresponds to the coding region of the gene. The position of each genomic region counts from the upstream flanking region of each gene (×105 bp). The distribution of CpG islands in other genes examined is presented in Supplementary Fig. 1

To test whether SINE and LINE-1 densities are different between coding sequences and flanking sequences, we calculated the frequencies of these elements in imprinted and control genes and in a total of 2571 genes from bovine chromosomes 5, 6, and 14. Figure 3 shows boxplots of SINE and LINE-1 frequencies in imprinted and control genes. The Wilcoxon-Mann-Whitney test revealed significantly lower SINE density in imprinted genes compared to biallelically expressed genes (p = 0.002) and chromosome 14 (p = 0.0015), chromosome 5 (p = 0.0005), and chromosome 6 (p = 0.0001) genes (Fig. 3a). In contrast, LINE-1 frequency was not significantly different between gene groups examined (Fig. 3b).

Fig. 3
figure 3

Boxplots of retrotransposable element densities in imprinted genes, in biallelically expressed control genes, and in genes from bovine chromosome 14 (n = 671), chromosome 5 (n = 1335), and chromosome 6 (n = 565). SINE and LINE-1 frequencies were calculated for the coding sequence and 50-kb upstream and downstream of the coding region. a SINE frequency in imprinted genes was 10.89% compared with 16.57%, 15.42%, 15.65%, and 16.46% in biallelically expressed control genes and genes from BTA14, BTA5, and BTA6, respectively. b LINE-1 frequencies were 10.71%, 9.30%, 9.80%, 9.34%, and 9.93% in imprinted, biallelically expressed control genes, and genes from BTA14, BTA5, and BTA6, respectively

Discussion

To understand the evolution of genomic imprinting and the mechanisms controlling allele-specific expression of imprinted genes, it is crucial to identify species-specific imprinted genes and compare their structural features and to identify sequence elements that differentiate imprinted from nonimprinted genes (Okamura and Ito 2006). In this study we investigated the conservation of imprinting of 22 genes in human, mouse, and cattle. In addition, we analyzed the occurrence of the sequence elements CpG islands, C + G content, tandem repeats, and retrotransposable elements in imprinted and control cattle genes. Also, we investigated the conservation of tandem repeats located in imprinted genes in human, mouse, and cattle.

There is accumulating evidence of limited conservation of imprinted genes across species. Recently, we reported that the SDHD and COPG2 genes were not imprinted in cattle and sheep tissues (Khatib 2005a, b). Also, Kim et al. (2004) showed that ZIM2 was biallelically expressed in cattle in contrast to the monoallelic expression observed in human and mouse. Morison et al. (2005) reported that of 63 protein-coding imprinted genes, only 26 were imprinted in both human and mouse. Monk et al. (2006) reported the lack of imprinting of six human genes and the polymorphic imprinting of another three genes, all known to be placenta-specific, imprinted in the mouse.

Recently, we investigated the imprinting status of CD81, TSSC4, and OSBPL5—reported to be placenta-specific imprinted in mouse—on bovine chromosome 29 and the cluster of PEG10 and ASB4 on bovine chromosome 4 (unpublished data). CD81, TSSC4, and OSBPL5 were found to be expressed in all fetal tissues examined, including ovary, skeletal muscle, liver, pituitary, mammary gland, kidney, brain, spleen, heart, pancreas, eye, and caruncle. Imprinting analysis revealed biallelic expression of CD81 in all cattle tissues examined, like the human gene (Monk et al. 2006), but in contrast to the mouse ortholog which was reported to be maternally expressed in placental tissues (Lewis et al. 2004). Species-specific imprinting was also observed for OSBPL5, which is known to be imprinted in mouse (Engemann et al. 2000) and human placenta (Higashimoto et al. 2000) but biallelically expressed in other tissues. In contrast to human and mouse, our study revealed biallelic expression of OSBPL5 in both placental and nonplacental tissues (data not shown). Similarly, TSSC4 was shown to be imprinted in cattle (data not shown) and mouse but not in human (Monk et al. 2006).

The number of known imprinted genes in cattle is small and the mechanisms regulating imprinting in this species are poorly understood. However, the lack of conservation of placenta-specific imprinted genes between mouse and human might be due to allele-specific histone modifications present in mouse but absent in human genes (Monk et al. 2006). This may be a contributing factor to why, for the placenta-specific genes reported in this study, the imprinting pattern was not conserved across human, mouse, and cattle. For ASB4, the imprinting status in human is not known and the mouse gene has been reported to be maternally expressed in a wide range of fetal tissues (Mizuno et al. 2002). In our study, although ASB4’s expression pattern was similar to that of mouse, the bovine gene showed biallelic expression in all examined fetal tissues (data not shown). Thus, further studies involving more imprinted genes and more species may be necessary to confirm whether species-specific imprinting is the rule or the exception.

Analysis of characteristic sequence elements revealed that G + C content was significantly higher in imprinted cattle genes when compared to that of control genes. In contrast, Hutter et al. (2006) did not find significant differences in G + C content between imprinted genes and control sequences in human and mouse. In a different study, Walter et al. (2006) reported that the G + C content was similar between imprinted genes and a subset of randomly selected autosomal genes in mouse. The discrepancy between our results and those of Hutter et al. (2006) and Walter et al. (2006) could be due to either species-specific differences or to the small number of imprinted cattle genes in our study.

For tandem repeats and CpG islands, there is accumulating evidence correlating these elements and genomic imprinting. Accordingly, Neumann et al. (1995) suggested using these elements as a search tool for imprinted genes. In this study the average number of CpG islands per kb genomic sequence was significantly higher in the coding sequences of imprinted genes than in control genes. This result was in agreement with the high density of CpG islands reported in imprinted genes in mouse but different than that found for imprinted human genes (Hutter et al. 2006).

For many imprinted genes, monoallelic expression is associated with differentially methylated region (DMR). In a search for sequence elements specific to primary DMRs in the mouse, it has been found that CpG content is higher in DMRs than in whole-genome sequence and in nonimprinted CpG islands (Kobayashi et al. 2006). Also, it has been found that paternally methylated DMRs have a lower density of CpGs than maternally methylated DMRs (Kobayashi et al. 2006). Of the 14 imprinted cattle genes examined here, 10 of their mouse homologs are associated with DMRs (reviewed in Kobayashi et al. 2006). At present, cattle DMRs are poorly understood. Hence, identification of additional imprinted genes and further investigation of cattle DMRs would improve our understanding of characteristics of imprinted genes.

Similar to CpG islands, we found a notable difference in the abundance of tandem repeats in bovine imprinted genes compared with control genes. This is in agreement with previous reports on the occurrence of these repeats in mouse and human imprinted genes (Hutter et al. 2006; Okamura and Ito 2006; Shirohzu et al. 2004). The high density of CpG islands and tandem repeats observed in the coding sequences of imprinted genes compared with 5′ and 3′ flanking sequences and to coding and flanking sequences of biallelically expressed genes implies that these elements have an important role in the monoallelic expression of imprinted genes.

Although the mechanisms by which tandem repeats affect genomic imprinting are not currently known, comparative analysis between species could provide a powerful tool to understand these mechanisms. This approach has proven successful in the identification of tandem repeats associated with imprinted genes Rasgrf1 and Impact (reviewed in Okamura and Ito 2006). Of considerable interest was our observation that highly conserved tandem repeats were found in nine imprinted genes in human, mouse, and cattle species. For GNAS and IGF2R, conserved tandem repeats were found in their CpG islands. Such high conservation indicates that these repeats might have a role in the regulation of allele-specific expression by attracting epigenetic modifications (Hutter et al. 2006; Neumann et al. 1995).

The observation that X-chromosome inactivation is associated with a high concentration of LINEs (Lyon 1998) has prompted several research groups to investigate the association of retrotransposable elements with human and mouse imprinted loci. In this study SINE elements were notably fewer in both coding and flanking sequences of imprinted genes compared to control genes. In a search for genomic characteristics that distinguish imprinted from nonimprinted genes in human, Greally (2002) found that the concentration of SINEs was much lower in imprinted loci compared to biallelically expressed genes. In a different study, Allen et al. (2003) found that the frequency of SINE sequences was lower in the flanking regions of monoallelic and imprinted genes. It is assumed that SINEs are either readily removed from imprinted regions or they are unable to transpose to these regions (Greally 2002).

LINEs were found to be significantly underrepresented in coding sequences of bovine imprinted genes compared with control genes, in contrast to Walter et al. (2006) who found that LINE elements were significantly denser in imprinted genes with a G + C content of greater than 40% compared with nonimprinted genes in the mouse. On the other hand, analysis of total coding and flanking sequences revealed that LINE-1 frequencies were not statistically different between imprinted and other gene groups examined. In addition, LINE-1 frequency was higher in the combined coding and flanking regions of imprinted genes (10.7%) than in coding sequences alone (4.7%). A high frequency of LINE-1s was observed in flanking regions of human and mouse monoallelically expressed genes but not necessarily in the regions of imprinted genes (Allen et al. 2003). It is conceivable that the distribution of LINEs in regions of imprinted genes could be species-specific.

Data obtained in this study showed that about a 8.8-Mb sequence of bovine X chromosome has 26.8% LINEs compared to 13.7% found in the autosomal nonimprinted genes. That LINE densities in cattle imprinted genes were low but were high in bovine X chromosome suggests that not necessarily the same mechanisms control X-chromosome inactivation and imprinting in cattle. It has been suggested that LINEs have a role in X-chromosome inactivation based on the density of these elements in the X chromosome (Lyon 2006). In fact, it has been found that human and mouse X chromosomes have 26% and 28.5% LINEs compared with 13% and 14.6% LINE sequences in autosomal sequences, respectively (Lyon 2006). Thus, the high density of LINEs in the bovine X chromosome could confirm the hypothesis that LINEs are a common feature of mammalian X chromosomes and that these elements have a function in X-chromosome inactivation (Lyon 2006).

In summary, in this study we investigated the imprinting status of 22 genes in human, mouse, and cattle and found that only 11 genes were conserved across the three species, of which seven genes were paternally expressed and three were maternally expressed. Comparison of sequence characteristics between imprinted and nonimprinted cattle genes revealed that coding sequences of imprinted genes have a higher G + C content and more CpG islands and tandem repeats than biallelically expressed genes. In contrast, imprinted genes have a lower concentration of retrotransposable elements compared with control genes. Of particular interest was the finding of conserved tandem repeat sequences across the three species, which indicates that these elements may have a role in the regulation of imprinting. Taken together, these sequence characteristics could be employed in the prediction of imprinted genes.