Background

For centuries, breeders have relied on principles of inheritance to obtain animals with desired traits [1,2,3,4,5,6,7]. However, emerging data indicate that assisted reproductive technologies (ART) may induce fetal overgrowth producing LOS –large offspring syndrome [8,9,10,11,12,13,14,15]. Partly, developmental anomalies arose from altered DNA methylation patterns causing imprinting defects in ICRs, inferred DMRs, or both [1, 2, 11, 12, 14]. Furthermore, researchers observed that Somatic Cell Nuclear Transfer (SCNT) procedures, could epigenetically disturb imprinted gene expression [11].

Overall, genome imprinting is relatively complex and requires orchestrated action of several proteins, including ZFP57, KAP1/TRIM28, and a subset of DNA methyltransferases [16,17,18,19,20]. In ICRs/gDMRs, ZFP57 recognizes its methylated hexameric site [21] and thus plays a central role in the establishment of genomic imprints [16, 21, 22]. ZFP57 family members (KZFPs) are encoded in the hundreds by the genomes of higher vertebrates [23, 24]. Most KZFPs are essential to the recruitment of KAP1 and associated effectors to chromatin to repress transcription [23]. ZFP57 is necessary to maintaining the DNA methylation memory at multiple ICRs in mice embryos and embryonic stem cells [16, 21, 22, 25]. In addition to ZFP57 binding sites, the ICRs often include closely-spaced ZFBS-morph overlaps [26]. These overlaps consist of ZFP57 binding site overlapping a subset of the MLL1 binding units known as morphemes [27]. Since these units are CpG-rich, they are spread across the CpG islands. In the islands that encompass coding exons, the morphemes impact codon utilization [28]. Overall, in the human genome, CpG-rich promoters encompass many CpG-rich motifs with potential regulatory characteristics [29, 30].

ZFBS-morph overlaps are composite-DNA-elements that could play dual but antagonistic roles in the regulation of allele-specific gene expression: ZFP57 binding to its methylated sites to maintain allele-specific gene repression; binding of MLL1 or MLL2 to CpG-rich sequences to protect ICRs from methylation to support transcription [26]. MLL1 is the founding member of a protein family whose structure includes a conserved domain (SET) that catalyzes methylation of H3K4 (lysine 4 in histone H3) producing H3K4me3 marks in nucleosomes [31]. Trimethylated H3K4me3 is associated with active or transcriptionally poised chromatin states [32]. In addition to the SET domain, the structures of MLL1/KMT2A and MLL2/KMT2B include a domain (MT or CXXC) that binds unmodified CpG-rich DNA [33,34,35]. De novo mutations in MLL1 cause Wiedemann-Steiner syndrome [36]. Symptoms vary and may include delayed growth and development, asymmetry of the face, hypotonia, and intellectual disability [36]. Mutations in MLL2 cause complex early-onset dystonia [37, 38]. In mouse oocytes, MLL2 was required for bulk H3K4 trimethylation [39].

In both mouse and human DNA, known ICRs/gDMRs encompass clusters of two or more ZFBS-morph overlaps [40, 41]. Therefore, we wished to investigate whether known or inferred cattle ICRs/DMRs also included these composite-DNA-elements for regulating parent-of-origin-specific expression. To do so, we performed genome-wide analyses of Bos taurus chromosomal DNA sequences. Firstly, we located ZFP57 binding sites and ZFBS-morph overlaps. Subsequently, we created density-plots to pinpoint ICR positions in Bos taurus DNA. By uploading our datasets onto the UCSC genome browser, we could obtain snapshots to view peak positions with respect to genomic landmarks. These snapshots uncovered a connection between peaks in plots and the ICRs in cattle imprinting domains including H19—IGF2, KCNQ1, IGF2R, and PEG3. Additional snapshots revealed such a connection for: the essential ICR in the GNAS complex locus; and ICRs in PLAGL1, MEST, NNAT, MEG8, SNRPN, HERC3-NAP1L5, and INPP5F, loci. Since peaks in plots could locate known ICRs/DMRs in Bos taurus, we anticipate that with our approach one could discover candidate ICRs and novel imprinted genes for experimental validations.

Results

Our datasets consist of genomic positions of ZFP57 binding site, ZFBS-morph overlaps, and peaks in the density-plots. To evaluate the power of our strategy, we have closely inspected peak positions in genomic sequences reported for mouse, human, and Bos taurus (Table 1). In addition to previously published reports [26, 40, 41], we have also checked peak positions with respect to experimental data reported in GEO series GSE77444. Briefly, the experimental approach was to use mouse embryonic stem cells (ESCs) E14 to locate regions associated with ZFP57, KAP1, and H3K9me3 marks in chromatin [25]. Such associations would reveal the positions of ICRs/gDMRs in genomic DNA [22, 25]. Noteworthy could be that for the build mm9 of the mouse genome, GSE77444 also included a link for viewing the individual data series in tracts displayed on the UCSC genome browser [25]. Additional file 1 gives snapshots demonstrating that peaks in the density-plots occur precisely in regions associated with ZFP57, KAP1, and H3K9me3 marks in chromatin prepared from mouse ESCs. Based on these and other evaluations, we imagined that with our strategy, one could locate known and candidate ICRs in mammalian DNA. To further investigate this idea, initially we give examples of the positions of ZFP57 binding site, ZFBS-morph overlaps, and peaks in the density-plots in the context of known ICRs or inferred DMRs dispersed along the Bos taurus genome. Afterward, we cover examples illustrating that with our strategy we could discover novel candidate ICRs and imprinted genes in cattle DNA.

Table 1 Peaks in the density-plots locating known or inferred ICRs in 3 mammalian species

Along Bos taurus chromosomes, peaks in the density-plots pinpointed known, unknow, and inferred ICRs/DMRs

In mice, the expression of H19, Igf2, and Ins2 is regulated by a single ICR/gDMR positioned upstream of H19 [18]. The noncoding RNA gene (H19) is transcribed from the maternal allele. Igf2, and Ins2 are expressed from the paternal allele and impact fetal growth and body size. As in mice, the H19IGF2 imprinted domain is important to normal growth and fetal development in cattle [2, 11, 42, 43]. In cloning studies, deceased newborn calves displayed abnormal expression of H19 and IGF2. In normal surviving adults, the expression of IGF2 in muscle was highly variable [44]. Furthermore, aberrant methylation of a DMR upstream of H19 produced abnormal calves and LOS [11, 45]. In H19 DMR, a study found 33 CpGs in 600 bps [11]. Another study detected a 300-bp DMR at approximately 6 kb upstream of the H19 promotor [46]. This DMR mapped to a predicted CpG island with a CTCF-binding site corresponding to consensus sequence 5’ -GCGGCCGCGAGGCGGCAGTG- 3’ [46].

In the density-plot of Chr29, we noticed a robust peak in a CpG island (CpG45) upstream of H19 (Fig. 1). Peak position agrees with the reported DMR in cattle DNA [46]. Furthermore, the length of CpG45 is about the same as the CpG-rich DNA (600 bps) selectively methylated in cattle paternal allele [11]. Additionally, under the density-peak, we located 3 predicted CTCF sites (Fig. 1). In the analyzed DNA segment, one site (CTCF_Ren) corresponds to the consensus sequence (GCGGCCGCGAGGCGGCAGTG) identified previously [46]. Three sites are under the density peak and map to the CpG island that includes the experimentally identified cattle DMR [46]. The 5th site is downstream of CpG45 (Fig. 1). Thus, we inferred that our approach correctly located the ICR in cattle H19 —IGF2 imprinted domain.

Fig. 1
figure 1

A robust peak identifying the ICR of the H19IGF2 imprinted domain. From top to bottom, tracks display the positions of: cattle RefSeq genes, in pack format; the CpG islands, in pack format; peaks in density-plot, in full format; ZFBS-morph overlaps, in pack format; ZFP57 binding sites, in dense format. Orange bars mark the positions of predicted CTCF binding sites. Short match corresponds to a previously reported consensus CTCF binding sequence

As in mice [47], the KCNQ1 imprinted domain is adjacent to the H19IGF2 domain and regulated by an ICR Known as the KvDMR1 in cattle [48]. In mice, the KvDMR1 encompasses the Kcnq1ot1 promotor and regulates imprinted expression of several protein-coding genes [42, 47, 49, 50]. This intragenic ICR is selectively methylated in oocytes but not in sperm [42, 51]. Thus, while Kcnq1ot1 is transcribed from the paternal allele producing a noncoding RNA, the expression of several protein coding genes is repressed in the maternal allele [42]. In mice, targeted deletion of the KvDMR1 caused loss of imprinting and growth deficiency [52]. In cattle, defects in the KvDMR are the most common genomic region affected in LOS [8, 12, 15, 45, 48].

In the density-plots of Bos taurus DNA, we observed a very robust peak in one of the KCNQ1 introns (Fig. 2). In order to ensure that this peak was locating the KvDMR1, initially we selected the 5’ and 3’ ends of human KCNQ1OT1 to perform a BLAT search at the UCSC genome browser. Due to length constraints imposed by BLAT, we could not use the entire human KCNQ1OT1 as query. Nonetheless, from the BLAT output we could deduce that the peak was in the vicinity of KCNQ1OT1. For additional validation, we examined results of a study that localized cattle KvDMR1 using 2 primers to amplify an intragenic DNA. In that study, statistical analysis showed a significant difference in the methylation level between the two parental alleles –confirming that the amplified DNA corresponded to the KvDMR1 [53]. Therefore, we performed another BLAT search using the sequences of the 2 primers selected for amplifying an intragenic DNA [53]. A close-up view shows that the primers are within a relatively long CpG island that encompasses the robust peak in the density-plots (Fig. 2). Thus, that peak located the central position of the KvDMR in cattle. Notably, while the KvDMR1 in mice encompasses 2 ZFBS-morph overlaps [26], that in cow encompasses 7. Therefore, it seems that ICRs display species-specific differences in the number of ZFBS-morph overlaps they encompass.

Fig. 2
figure 2

A robust peak pinpointing the KvDMR1. Previously, a DNA segment was amplified for locating the KvDMR1 in bovids. The plot shows the position of the two primers selected for amplification reaction. ZFBS-morph overlaps are shown in packed and ZFP57 binding sites in dense formats

Next, we examined the density-plots in a DNA segment that encompasses the PLAGL1 locus (Fig. 3). Comparative analyses have identified a group of transcription factors with a zinc finger at their amino-terminus [54]. This group includes PLAG1, PLAGL1, and PLAGL2. One of the PLAGL1 transcripts (ZAC1) encodes a protein that inhibits tumor-cell-proliferation through the induction of cell cycle arrest and apoptosis [55]. In contrast, PLAG1 and PLAGL2 are proto-oncogenes [55]. ZAC1 is an intragenic maternally imprinted transcript [56,57,58]. In both the mouse and human genomes, transcription of ZAC1 originates within an intronic sequence in PLAGL1. The intronic DNA includes another imprinted gene known as HYMAI [26, 57, 58]. Both ZAC1 and HYMAI are selectively expressed from the paternal allele. Their ICR/gDMR corresponds to an intragenic CpG island [26, 57, 58]. When the gDMR from human DNA was transferred into mice, it acted as an ICR and regulated allele-specific expression [59]. In LOS induced by assisted reproduction, imprinting was compromised producing biallelic expression [8, 13].

Fig. 3
figure 3

A robust peak locating the ICR for imprinted expression of the PLAGL1 transcript known as ZAC1. The peak is at the correct position with respect to two human imprinted transcripts: one corresponding to HYMAI; the other to ZAC1

As previously observed in mice [60], PLAGL1 locus in cattle encompasses a cluster of ZFBS-morph overlaps. In the density-plots, that cluster defines a peak (Fig. 3). However, instead of several PLAGL1 transcripts, cattle RefSeq Genes displayed only one. We suspected that this transcript corresponded to a previously reported annotation. Specifically, in a panel of bovine imprinted genes, PLAGL1 was among the loci that acquired methylation marks in an oocyte size-specific manner [61]. In that panel, a putative DMR was localized in a CpG island at the 5’ end of a single transcript referred to as PLAGL1/ZAC1 [61]. To obtain clues about a complete annotation, we inspected our predicted ICR-position with respect to Non-Cow RefSeq Genes. In the context of human RefSeq Genes, we noticed 2 PLAGL1 transcripts and a noncoding RNA gene marked as HYMAI (Fig. 3). Therefore, it seems likely that ZAC1 corresponds to the transcript annotated as PLAGL1 in cattle RefSeq Genes. In that context, the peak in cattle DNA is intragenic –as observed in human genomic DNA [41]. Hence, we are tempted to conclude that our strategy has pinpointed the reported putative ZAC1 DMR [61], and to deduce that in cattle PLAGL1 locus, the putative ZAC1 DMR is a bona fide ICR (Fig. 3).

Next, we examined the density-plots in a region encompassing the IGF2RAIRN imprinted domain. In mouse, the second intron of Igf2r lies the Airn promoter [62]. Also known as Air, Airn specifies an imprinted cis-silencing noncoding transcript [62]. Igf2r (insulin-like growth-factor type-2 receptor) is maternally expressed and impacts fetal and placental growth [63, 64]. Its functions include transport of IGF2 into cells and to lysosomes for degradation [65]. In mice, Igf2r knockout caused fetal overgrowth and neonatal lethality [66]. The Igf2r—Airn imprinted domain includes two differentially methylated CpG islands [64]. DMR1 encompasses the Igf2r promotor. The CpG island that encompasses DMR2 is intragenic. In this island is incorporated the promotor of Airn [67]. DMR2 regulates expression of Igf2r from the maternal and Airn from the paternal allele [68]. In mice, deletion of DMR2 caused biallelic Igf2r expression [67]. In cattle, The IGF2R—AIRN imprinted domain also includes the DMR2 that regulates imprinted expression. In SCNT experiments, IGF2R expression was consistently biallelic –regardless of the source of cattle embryos [11].

While the IGF2R—AIRN imprinted domain in Bos taurus encompasses many CpG islands, one could assume that DMR1 corresponds to the island at the 5’ end of IGF2R and DMR2 to the intragenic island that maps to the AIRN promotor (Fig. 4). In that context, in the density-plots clearly apparent is a robust peak in the CpG island at the AIRN promotor (Fig. 4). Since this peak is in DMR2, we deduce that our approach has pinpointed the ICR in the IGF2R—AIRN imprinted domain in cattle. Notably, while the gDMR in the mouse genome encompasses 7 ZFBS-morph overlaps [26], the gDMR in Bos taurus encompasses 3. Thus, as mentioned above, the number of ZFBS-morph overlaps in ICRs could be species-specific.

Fig. 4
figure 4

A robust peak defining the ICR of IGF2R imprinted domain in cattle genome. Displayed in packed formats are the genomic positions of the CpG islands, ZFBS-morph overlaps, and ZFP57 binding sites

Next, we examined the density-plots in a region encompassing the complex GNAS locus (Fig. 5). In both the human and mouse genome, the GNAS locus encompasses multiple DMRs, promotors, and allele-specific transcripts [69]. Since several transcriptional variants are produced from differential exon utilization, they are collectively referred to as GNAS. The well-studied Gnas locus in mice includes three groups of protein-coding transcripts (Nesp55, XLas, and Gas). Although these transcripts share alternative exons, they are regulated by separate promotors [69, 70]. Among the transcripts: Nesp55 is expressed from the maternal allele; XLas from the paternal allele; Gas from both alleles [70]. Nesp55 specifies a neuroendocrine secretory protein known as SCG6. XLas and Gas transcripts are related. Their products function in signal transmission, by G coupled hormone-receptors (GPCRs), and display distinguishable properties [71,72,73,74]. The locus also includes an imprinted gene (Nespas) transcribed into a noncoding antisense RNA [69].

Fig. 5
figure 5

Peaks predicting the position of the essential ICR in the GNAS complex locus. The arrow points to result of a BLAT search identifying the longer GNAS transcript as NESP55 (encoding SCG6). The predicted ICR position seems correct with respect to the annotation of Non-Cow RefSeq Genes including human GNAS-AS1

At the genome browser, we noticed that in contrast to complete annotations reported for mouse and human genomic DNA, cattle RefSeq genes displayed only two transcripts (Fig. 5). From a publication [75], we inferred that the shorter transcript was likely to encode the Gas subunit of GPCRs. In cattle, SNPs within that transcript were associated with performance traits [75]. Even though not displayed on the browser, for cattle a previous study identified GNASXL as a paternally expressed transcriptional isoform. The study also described a maternally expressed transcript designated as GNAS or NESP55 [76].

Because the UCSC genome browser’s gene annotations appeared incomplete, we performed a BLAT search to locate NESP55 in Bos taurus DNA. For query we chose the amino acid sequence of human SCG6. Based on the output, we inferred that on the browser, the longer GNAS transcript corresponded to NESP55 (Fig. 5). Cattle NESP55 transcripts are expressed monoallelically in many tissues [76]. Furthermore, in cattle, the GNAS locus includes a DMR (a putative ICR) that is hypomethylated in the paternal allele and hypermethylated in the maternal allele [76].

In the density-plots of GNAS locus in cattle, we observed two clusters of ZFBS-morph overlaps producing two peaks in Bos taurus DNA (Fig. 5). Similarly, a previous study also noticed two clusters of ZFBS-morph overlaps in the mouse locus [26]; also see Additional file 1. In cattle, both peaks are in the 1st intron of the transcript corresponding to NESP55 (Fig. 5). Notably, the essential ICR in human locus includes two DMRs: one DMR maps to the first exon of XLas; the other is near GNAS-AS1 TSS [69]. With respect to human RefSeq Genes, peak positions in cattle DNA map to human XLas and to a region upstream GNAS-AS1 (Fig. 5). In toto, we could infer that our strategy predicted the genomic position of the essential ICR in the complex GNAS locus in Bos taurus.

Next, we inspected the PEG3 imprinted domain in Bos taurus DNA (Fig. 6). Sequence analyses have predicted that the PEG3 domain encompasses many CpG islands [77]. Consistent with this prediction, at the genome browser we observed several CpG islands (Fig. 6). The island encompassing the PEG3 promotor was the only area that showed DMR status in cattle [77]. In that area, the density-plots revealed a peak (Fig. 6). Since the DMR was methylated in an allele-specific manner [77], we deduced that the peak pinpointed the ICR in cattle PEG3 imprinted domain. Furthermore, earlier studies of mice, revealed that clusters of ZFBS-morph overlaps mapped to functionally important landmarks –including DNase I hypersensitive sites and repressive H3K9me3 marks in chromatin [26]. In mouse, these clusters were in the 1st Peg3 intron. Likewise, for cattle we observe an intragenic peak. Therefore, our strategy pinpointed the central portion of the ICR in cattle PEG3 imprinted domain (Fig. 6).

Fig. 6
figure 6

A robust peak locating the ICR in the PEG3 imprinted domain. As detailed in the text, this peak is within the CpG island that is methylated in a parent-of-origin-specific manner

In mice, the Peg3 domain is comprised of several genes [78]. The product of Peg3 is a relatively large nuclear protein with 12 zinc fingers of C2H2 type [79]. In mice, a mutation in Peg3 caused a striking impairment of maternal behavior [80]. Due to the dearth of maternal care, the litters developed poorly and often died [80]. Mechanistically, mutant mothers were deficient in milk ejection –partly due to defective neuronal connectivity, as well as reduced oxytocin neurons in the hypothalamus [80]. In domesticated animals, oxytocin is an indicator of psychological and social well-being [81]. As observed in mice and humans, PEG3 is expressed from the paternal allele in cattle [2, 77]. As the consequence to assisted reproduction, the PEG3 domain has undergone a global loss of imprinting producing LOS [8].

Next, we examined the density-plots in a region encompassing the MEST locus (Fig. 7). In cattle, among 8 investigated genes, only MEST showed differential expression in day 21 parthenogenetic embryos [82]. In mice, the TSS of a paternally expressed transcript originates from the 1st Mest intron [83, 84]. This transcript is also known as Peg1 and regulated by an intragenic ICR.

Fig. 7
figure 7

A robust peak locating the ICR for imprinted expression of the MEST variant. The peak position is correct in the context of Non-Cow RefSeq Genes for human MESTIT1 and MEST transcriptional variant

In a panel of maternally imprinted genes in cattle, researchers localized putative DMRs in several imprinted loci including MEST [61]. By CpG analyses, they inferred that the putative Peg1/MEST DMR mapped to a CpG island [61]. Consistent with this prediction, the density-plots revealed a peak in a CpG island in the MEST locus (Fig. 7). Since on the browser we observed a single MEST transcript, we inspected peak position with respect to Non-Cow RefSeq Genes. In the context of human RefSeq Genes, the peak in plots is upstream of MESTIT1 and near one of the MEST short isoforms. Also known as PEG1-AS, human MESTIT1 is a paternally expressed non-coding RNA gene [85, 86]. Note that with respect to human MEST and MESTIT1, peak location agrees with the position of reported imprinted DMR in cattle [61]. Hence, we deduced that our strategy correctly pinpointed the ICR regulating the expression of PEG1/MEST transcript in Bos taurus (Fig. 7).

Next, we examined the density-plots in a DNA segment that includes NNAT (Fig. 8). As in the mouse locus [7, 84], in cattle NNAT lies in the single BLCAP intron (Fig. 8). In mice, Nnat or Peg5 is an imprinted gene; Blcap is expressed biallelically [7]. From Nnat are produced several alternatively spliced isoforms As in humans and mice [7, 87, 88], in cattle two NNAT transcripts are expressed from the paternal allele [89].

Fig. 8
figure 8

A robust peak locating a candidate ICR for imprinted NNAT expression. This peak is in CpG59 and maps to a known cattle DMR

In cattle DNA: a CpG island (CpG37) is near the BLCAP TSS; another island (CpG59) is intragenic (Fig. 8) [7]. In the vicinity of NNAT, the density-plots includes two peaks. The robust one is in CpG59 and encompasses the NNAT promotor. In human, the CpG island at the NNAT promotor was differentially methylated in all examined tissues [88]. Therefore, CpG59 is likely position of the ICR that regulates imprinted NNAT expression in cattle DNA (Fig. 8). Overall, robustness of peaks depends on the number of ZFBS-morph overlaps they encompass. Peaks that cover 2 ZFBS-morph overlaps could be true or false positive [40]. Peaks that cover 3 are more reliable (Fig. 8).

Next, we examined the density-plots in a region that encompasses MEG8 (Fig. 9). In ovine, MEG8 is expressed from the maternal allele producing a noncoding RNA [90]. In an 8-wk-old animal, MEG8 was preferentially expressed in skeletal muscle [90]. In Angus calves, maternal diet during pregnancy impacted MEG8 expression in longissimus dorsi muscle [91]. In adult cattle, MEG8 was expressed in several tissues –including heart, liver, spleen, lung, kidney, brain, subcutaneous fat and skeletal muscle [92]. In heterozygous cattle, MEG8 was expressed from only one of the two parental alleles suggesting that MEG8 is an imprinted transcript [92]. In the density-plots, we observed a robust peak predicting a candidate ICR for allele-specific expression of MEG8 in cattle. This peak is intragenic and maps to a CpG island (Fig. 9).

Fig. 9
figure 9

A robust peak predicting the ICR for imprinted expression of MEG8. This peak is intronic and maps to a CpG island

Furthermore, a peak in the density-plots located a candidate ICR for imprinted expression of SNRPN (Fig. S1). This peak in a CpG island that includes the SNRPN promoter, TSS, and first exon (Fig. S1). In cattle, SNRPN alleles were unmethylated in sperm, methylated in oocytes, and approximately 50% methylated in somatic samples [93]. When compared to in vivo and in vitro-produced embryos, the CpG island at the 5’ end of SNRPN was abnormally hypomethylated in SCNT-derived Day 17 elongating embryos [93].

A peak in the density-plots also located a candidate ICR for imprinted expression of NAP1L5 in Bos taurus (Fig. S2). As in mouse [94], NAP1L5 is paternally expressed in cattle [89]. In mouse, Nap1l5 is entirely contained in the intron of another gene (Herc3) and transcribed in the opposite orientation [95]. Inspection of the density-plots revealed a peak in an intronic region in the HERC3 locus in Bos taurus (Fig. S2). This peak is a CpG island (CpG28) at the 5’ end of NAP1L5 and thus defines a candidate ICR for imprinted gene expression. Since the peak is at a similar location as the one observed in the known Nap1l5 ICR in mouse, our strategy accurately pinpointed the NAP1L5 ICR in cattle.

Noteworthy could be that in evaluations of plots, we also observed a peak in a CpG island at the 5’ end of DGAT1 (Fig. S3). This finding predicted that DGAT1 is an imprinted gene. Consistent with this interpretation is that in cattle, DGAT1 is expressed from the maternal allele [2]. In metabolic pathways, diacylglycerol acyltransferase 1 (DGAT1) catalyzes the last step in triacylglycerol synthesis in the mammary gland. Therefore, DGAT1 underlies genetic variations in milk-fat composition of dairy cows [96]. In cattle, buffalo, goat, and sheep, DGAT1 is associated not only with milk but also with meat characteristics [97]. Furthermore, DGAT1 was positively selected amongst European Bos taurus breed with large phenotypic effects [98].

Within Bos taurus chromosome 26, a peak predicted a candidate ICR for imprinted INPP5F_V2 expression

INPP5F is an inositol 4-phosphatase that functions in the endocytic pathway [99]. In mice, Inpp5f_v2 was identified as an imprinted gene in the brain [100]. Inpp5f_v2 is a variant of Inpp5f. It is a retrogene with a unique alternative first exon [100]. Inpp5f_v2 transcription originates in a CpG island within intron 15 in Inpp5f  [100]. However, in literature surveys, we could not find any report concerning cattle INPPF5_V2. In mouse, Inpp5f is biallelically expressed; Inpp5f_v2 is transcribed from the paternal allele [100]. In that locus, CpG analyses identified two CpG islands [100]. CpG1 is near the 5′ end of Inpp5f, CpG2 is at 5′ end of Inpp5f_v2. Bisulfite sequencing of CpG1 showed that both Inpp5f alleles were hypomethylated, as would be expected for a nonimprinted transcriptionally active gene. In mouse brain, CpG2 was methylated on Inpp5f_v2 maternal allele, but not on paternal allele [100]. Thus, the intragenic island regulates imprinted Inpp5f_v2 expression [100]. Similarly, the promotor of human INPPF5_V2 transcript is embedded within a maternally methylated DMR [101].

Even though in literature surveys, we could not find any report concerning cattle INPPF5_V2, we imagined that our strategy may provide clues into imprinted expression of this retrogene in Bos taurus.

However, the genome browser did not include any annotation for the INPPF5 in the build bosTau8. Thus, we were dealing with a blank canvas to explore whether or not our strategy could locate a candidate ICR in the absence of complete information. To explore this idea, we located INPP5F transcripts in the context of Non-Cow RefSeq Genes. With respect to human transcripts, our approach revealed a candidate ICR for imprinted expression of INPPF5_V2 in cattle (Fig. 10). This candidate maps to a CpG island (CpG55) upstream of MCMBP. In both human and mouse, this gene is downstream of INPPF5. Thus, we were in correct genomic area in cattle (Fig. 10). With respect to human transcripts, the candidate ICR is at the 5’ end of INPPF5_V2. Furthermore, with respect to human transcripts, CpG55 in cattle is intragenic. Thus, as demonstrated for mouse (Additional file 1), our strategy could find an intragenic ICR for imprinted expression of INPPF5_V2 in cattle (Fig. 10).

Fig. 10
figure 10

With respect to Non-Cow RefSeq Genes, a robust-peak predicting the ICR for imprinted expression of INPP5F_V2 in cattle. As observed for mice and humans, this peak is in an intragenic CpG island upstream MCMBP

The density-plots could facilitate locating candidate ICRs in Bos taurus

Since the density-plots were created genome-wide, we could view peak positions along the DNA in an entire chromosome. For example, examine a snapshot from the UCSC genome browser presenting peak positions along the entire DNA in Chr5 (Fig. 11). In the build bosTau8, this chromosome covers greater than 121,2 Mb DNA. We find that even along an entire chromosome, many peaks are clearly or almost fully resolved. This and related findings demonstrate that peaks in plots occur infrequently in cattle DNA. This outcome is expected because CpG frequency is relatively low in animal DNA [102]. Since peaks in plots encompass several CpGs, they represent uncommon events along genomic DNA. Additionally, as one would expect, peaks covering 3 or more ZFBS-morph overlaps are sparser than those that cover 2 (Fig. 11).

Fig. 11
figure 11

A snapshot of the density-plot obtained for the entire Chr5. Three very robust peaks point to candidate ICRs predicting imprinted gene expression from KRT, HMGA2, FOXRED2 loci. Peaks covering 2 ZFBS-morph overlaps could be true or false-positives

Since in plots, several peaks correctly identified regions that include known ICRs and inferred DMRs (Figs. 1, 2, 3, 4, 5, 6, 7, 8, 9 and 10), we imagined that additional peaks may correspond to candidate ICRs. Along Chr5, examples include 3 very robust peaks mapping to HMGA2, KRT, and FOXRED2 loci (Fig. 11). On the genome browser, we could obtain enlarged views to inspect peak positions with respect to nearby genes (Figs. 12, 13 and 14). Along Chr5, one of the 3 very robust peaks maps to a CpG island in an unannotated region in Bos taurus. With respect to Non-Cow RefSeq Genes, the peak is at the 5’ end of HMGA2 (Fig. 12). HMGA2 belongs to a superfamily of nuclear proteins [103]. Their structure includes a domain (AT-Hook) that binds DNA and nucleosomes with no strong preference for the underlying DNA sequence [103]. Polymorphisms in HMGA2 affected height in human and body stature in cattle [104]. This gene also impacts body size in mice [105], rabbits [106], horses [107], Shetland ponies and other small horses [108]. By applying genome-wide association studies for cattle stature, meta-analysis identified common genes that impacted body size [109]. The gene list included HMGA2. The association studies involved 58,265 cattle from 17 populations with 25.4 million imputed whole-genome sequence variants.

Fig. 12
figure 12

A closeup view of a candidate ICR for imprinted expression of HMGA2. The corresponding peak is in a CpG island

Along Chr5, another very robust peak maps to KRT (keratin) locus (Fig. 13). Keratin is one of the most important structural proteins in nature and is widely found in the integument in vertebrates [110]. After collagen, it is the most important biopolymer encountered in animals. In cow, sheep, goat, and pig, keratin is a constituent of skin, fur, wool, and hoof [110]. As other animal genomes, cattle DNA includes a DNA segment encompassing a cluster of KRT genes (Fig. 13). In the density-plot of a large section of Chr5, we noticed a robust peak mapping to a CpG island downstream of KRT4. However, when examined in the context of Non-Cow RefSeq Genes, it seems that the island maps to a KRT gene that is not annotated in Bos taurus DNA (Fig. 13). Along Chr5, another very robust peak maps to a CpG island that encompasses the promotor of FOXRED2 (Fig. 14). Also known as ERFAD, FOXRED2 is a luminal flavoprotein that functions in endoplasmic reticulum-associated degradation [111]. It facilitates the dislocation of certain endoplasmic reticulum-associated substrates to the cytosol. Although not in the very robust category, another peak in Chr5 maps to NR4A1 (Fig. 15). Nonetheless, the peak is robust since it encompasses 3 ZFBS-morph overlaps. It maps to the 5’ end of one of human NR4A1 transcriptional isoforms. For display we chose NR4A1 since it was among the candidate genes potentially driving tissue-specific differences between genetically distinct subspecies of cattle arising from independent domestication events [112].

Fig. 13
figure 13

A candidate ICR in the KRT locus. The corresponding peak is in a CpG island. Although the peak is upstream of KRT4 in cattle, the predicted ICR might be regulating imprinted expression of an unannotated gene in Bos taurus genomic DNA

Fig. 14
figure 14

A closeup view of a candidate ICR for imprinted expression of FOXRED2. The peak is in a CpG island

Fig. 15
figure 15

A candidate ICR regulating predicted imprinted NR4A1 expression. The corresponding peak maps to a CpG island

In the context of body stature, in addition to HMGA2 we came across LCORL. The density-plots include a very robust peak in the CpG island at the LCORL promotor (Fig. 16). Polymorphisms in LCORL are associated with stature in dog [113], horse [107, 114], pig [115], and sheep [116, 117]. In cattle, LCORL is in a chromosomal section that includes NCAPG (Fig. 16). In genome-wide association studies of cattle stature, LCORL and NCAPG were among the genes regulating body size [109]. Furthermore, both LCORL and NCAPG were associated with loci that underwent selective sweep in Bos taurus populations [98]. In beef cattle, both loci influence feed intake, meat, and carcass traits [3]. Functionally, LCOR and LCORL encode proteins that balance PRC2 subtype activities [118]. In higher eukaryotes, Polycomb group complexes (PRC) are essential for maintaining cellular identity [119].

Fig. 16
figure 16

A closeup view of a candidate ICR for imprinted expression of LCORL. The corresponding peak maps to a CpG island

While LCOR and LCORL impact body stature, PTEN influences milk production [120]. With our strategy, we identified PTEN as a potential imprinted gene regulated by a candidate ICR in Chr6 (Fig. S4). This predicted ICR is in a CpG island that encompasses the PTEN promoter, TSS, and exon 1. The PTEN-AKT pathway functions in the initiation of lactation through the induction of autocrine prolactin [121]. In dairy cows, PTEN inhibited mammary gland development and lactation [120]. In mammary epithelial cells, PTEN down-regulated secretion of beta-casein, triglyceride, and lactose. Thus, PTEN played a critical role in lactation related signaling pathways in dairy cows [120]. Noteworthy could be that in European cattle breeds, PTEN was among the genes in breed-specific selection sweeps [98].

While sampling peak positions in plots, we discovered additional candidate imprinting genes (Table 2). The listing includes SIX1. The gene product affects eye formation [122]. In mice, Six1 directly regulated the expression of gamma-crystallin genes and was essential for lens development [123]. Six1 also impacted myogenesis [124]. Mice lacking Six1 died at birth because of severe rib malformations and showed extensive muscle hypoplasia affecting most of the body muscles in particular certain hypaxial muscles [124]. Furthermore, myogenesis was altered in Six1-deficient mice [124]. Since Six1 impacted MyoD and myogenin expression, it was required for primary myogenesis of most body muscles, particularly those of hypaxial origin [124]. In Six1-deficient mice, additional processes were affected: including, development of the inner ear, nose, thymus, and kidney [125].

Table 2 A sample of candidate imprinted genes in Bos taurus

Besides SIX1, Table 2 lists SUFU. In adult mouse testis, SUFU was prominent in elongating spermatids –indicating a role for Hedgehog signaling in spermatogenesis [126]. Furthermore, SUFU was among the genes in multi-trait meta-analysis identifying genomic regions associated with sexual precocity in tropical beef cattle [127]. Table 2 includes another gene (CNNM1) that also impacts spermatogenesis. In mouse testis, expression of Cnnm1 was associated with cell cycle and differentiation of spermatogenic cells [128]. Listed in Table 2 is a gene (CNR1) that encodes a receptor for tetrahydrocannabinol (the principal component of marijuana). Although prevailing studies emphasize endocannabinoid activity in the brain, this compound also affects reproductive system in males [129].

Summary

Peaks in the density-plots correctly found 18 of the 20 fully characterized ICRs/gDMRs in mouse (Table 1). In the Bos taurus genome, peaks in plots also pinpointed nearly all known or inferred DMRs that functioned in allele-specific gene expression (Table 1). Thus, overall, the density-plots accurately identified a substantial fraction of characterized or inferred ICRs/gDMRs in mouse, human, and Bos taurus genomic DNA sequences (Table 1). Additionally, a peak in plots found a candidate ICR in the DGAT1 locus (Fig. S3). This finding predicted that DGAT1 is a potential imprinted gene in cattle. Since DGAT1 was expressed from the maternal allele [2], the strategy made a correct prediction. Importantly, in cattle DGAT1 impacted milk and meat characteristics [2, 97].

Furthermore –as previously demonstrated for human and mouse [40, 41]– peaks in plots of entire chromosomal DNA predicted ICRs for many candidates imprinted genes in Bos taurus (Fig. 11). Examples include PTEN, HMGA2, and LCORL (Figs. 12, 16, and S4). In dairy cows, PTEN inhibited mammary gland development and lactation [120]. Both HMGA2 and LCORL affected body stature in cattle [104, 109]. Therefore, the predicted ICRs for HMGA2 and LCORL may contribute to studies of LOS. Table 2 lists additional candidate imprinting genes discovered by our approach.

Discussion

In the course of many centuries, farmers have selected animals with economically and nutritionally important traits including size, meat quality, and milk production [1, 2]. And, literature surveys have identified numerous selection sweeps across 37 cattle breeds [98]. To propagate cattle with desired traits, researchers have also explored assisted reproductive technologies [8, 12, 130]. However, evidence indicates that animals made from cloning and other assisted reproductive tools often display phenotypes of imprinting disruptions [2, 11, 13, 14, 43]. Because of the impact of genomic imprinting on producing normal calves, it is necessary to develop strategies to discover candidate ICRs regulating expression of novel imprinted genes.

Historically, studies of mouse have offered a rich source of data for identifying imprinting genes in other species including cattle [2]. Previously, we applied our predictive strategy to mouse and uncovered several candidate ICRs in vicinity of genes that they may control [40]. In studies of the human genome, our strategy predicted ICRs for parent-of-origin specific expression of several experimentally inferred imprinted genes [41]. In human DNA, the density-plots also located candidate ICRs for potential imprinted genes associated with disease-states and developmental anomalies know as syndromes. Examples include association of: ARID1B with Coffin-Siris syndrome; PCNT with microcephalic osteodysplastic primordial dwarfism type II; IMPDH1 with Leber congenital amaurosis 11; PRDM8 with progressive myoclonic epilepsy-10; CITED2 with ventricular septal defect 2; and VAX1 with microphthalmia, cleft lip and palate, and agenesis of the corpus callosum [41]. Clearly, it could be challenging to discover such candidate imprinted genes by conventional techniques.

In order to investigate the suitability of our predictive strategy to studies of another mammal, we chose Bos taurus. We selected that species, because of the impact of genomic imprinting on the development of normal calves. Furthermore, much is known about: developmental irregularities; epigenetic anomalies; and aberrant DNA methylation imprints in aborted clones [8, 10,11,12,13]. We envisage that our predictive strategy could offer the opportunity to investigate genetic variations among cattle breeds in the context of candidate ICRs and imprinted genes.

Our strategy facilitates ICR detection across relatively long genomic DNA sections [40, 41], and even across an entire chromosomal DNA (Fig. 11). In evaluations, we found that peaks in the density-plots, pinpointed known ICRs or putative DMRs in cattle (Table 1). Examples include localization of the ICRs regulating expression of: imprinted domains (Figs. 1, 2, 4 and 6); and intragenic genes and transcripts (Figs. 7, and 8). Furthermore, our strategy predicted the central position of the ICR in H19IGF2 imprinted domain (Fig. 1) and the essential ICR in the complex GNAS locus (Fig. 5). It also predicted candidate ICRs in unannotated or not fully annotated loci in the build bosTau8 of Bos taurus. Examples include the PLAGL1, GNAS, and INPP5F loci (Figs. 3, 5 and 10). Moreover, peaks in plots predicted candidate ICRs for allele-specific expression of MEG8, SNRPN, and NAP1L5 (Figs. 9, S1, S2) –known imprinted genes in cattle [9, 92, 93, 131]. Also, a peak in plots found a candidate ICR in the DGAT1 locus (Fig. S3). In cattle, DGAT1 is expressed from the maternal allele [2]. Thus, our strategy recognized DGAT1 as an imprinted gene. Since DGAT1 catalyzes the last step in triacylglycerol synthesis, it impacts the quality of meat and dairy products. Hence, it is not surprising that DGAT1 is important to milk and meat characteristics [2, 97]. Mostly, genetic variations in DGAT1 influenced milk-fat composition of dairy cows [96].

Based on described findings, we imagined that the density-plots could help with the discovery of candidate ICRs for potential novel imprinted genes in Bos taurus (Table 2). To explore this idea, one could inspect plots of an entire chromosomal DNA sequence to identify robust peaks defining known or candidate ICRs (Fig. 11). In enlarged views, one could identify genes in the vicinity of candidate ICRs and thus locate potential imprinted genes for experimental validations.

Along Chr26, covering nearly 52 Mb DNA, many of the robust peaks were clearly or almost fully resolved [132]. Similarly, along Chr5 –which encompasses nearly 121,2 Mb DNA– several peaks were fully resolved (Fig. 11). Thus, overall, inspection of plots revealed that robust peaks occurred infrequently along chromosomal DNA. Advantages of our predictive strategy include locating candidate ICRs and imprinted genes not easily detectable or costly to find by conventional techniques. Examples include candidate ICRs in KRT, HMGA2, and LCORL loci (Figs. 12, 13, and 16). In cattle, the KRT locus encompasses many genes (Fig. 13). In cow, sheep, goat, and pig, keratin (KRT) is a constituent of skin, fur, wool, and hoof [92]. In meta-analysis, both HMGA2 and LCORL were associated with cattle stature [109]. Hmga2 knockout mice exhibited impaired muscle development and reduced myoblast proliferation; its overexpression promoted myoblast growth [133]. This finding revealed a possible role for HMGA2 in meat quality. LCORL is among the genes regulating body size [109]. Furthermore, in Bos taurus populations, the LCORL locus has undergone selective sweeps [98]. Considering the importance of genomic imprinting in normal growth and developmental processes in cattle [2, 11, 13, 14, 43], it might be significant that our strategy identified HMGA2 and LCORL as candidate imprinted genes and ICRs for their allele-specific expression (Figs. 12 and 16). Furthermore, the density plots revealed a candidate ICR for imprinted expression of PTEN (Fig. S4). While LCOR and LCORL impact body stature, PTEN influences milk production [120]. Table 2 lists additional candidate imprinted genes. They were discovered while we were sampling genes in the vicinity of peaks, in order to explore the power of our approach.

Conclusion

In this report, we offered a predictive genome-wide strategy to discover candidate ICRs and novel imprinted genes in Bos taurus. We gave evidence for robustness of our strategy by pinpointing several of the well-known ICRs and inferred DMRs in various chromosomal DNA sections. We also showed discovery of candidate ICRs for known imprinted genes and transcripts in unannotated or not fully annotated segments in Bos taurus genomic DNA. We also gave examples of how with our strategy, one could view the positions of known and candidate ICRs along the entire chromosomal DNA sequences. If surveyed in the context of the wealth of known genetic variations among cattle breeds, our strategy could serve as a resource for locating candidate ICRs and their nearby genes. Clearly, predictive methods require experimental validations. Therefore, below we offer links for downloading our datasets for creating custom tracks on the UCSC genome browser. The browser offers a great resource for studies of genomic DNA sequences in higher organisms [134,135,136]. On the browser, default tracks facilitate viewing our datasets in the context of genomic landmarks including genes, transcripts, the CpG islands, SNPs, and much more [137].

Methods

Marking the genomic positions of ZFP57 binding site and the ZFBS-morph overlaps

For studies of genomic imprinting in Bos taurus, we followed a previous approach applied to mouse and human genomic DNA [40, 41]. Briefly: at the UCSC genome browser, we retrieved the DNA sequences of cattle chromosomes (reported for the build bosTau8). Next, we wrote a Perl script to obtain the genomic positions of the reported hexameric ZFP57 binding site [21], and the positions of ZFBS-morph overlaps [60]. The script opened the file containing the nucleotide sequence of a specified chromosome, as well as either the file containing the ZFP57 binding site or sequences of the ZFBS-morph overlaps. Subsequently, it located their genomic positions. With UNIX subroutines, we combined the outputs obtained for various chromosomes to create a file suitable for upload onto the UCSC genome browser.

Creating plots of the density of ZFBS-morph overlaps in genomic DNA

With another Perl script, we established the genomic positions of DNA segments that covered 2 or more closely spaced ZFBS morph overlaps. That script opened the file containing the positions of ZFBS-morph overlaps for a specified chromosome. Subsequently, the script scanned the file to count and report the number of ZFBS-morph overlaps within a sliding window consisting of 850-bases. We selected the window size by trial and error. Large windows tended to produce false peaks; small windows gave peaks with a spiky appearance. To remove background noise, the script ignored isolated overlaps. Next, with a UNIX subroutine, we combined and tailored the outputs of the program for display as a custom track on the UCSC genome browser. While creating our datasets, bosTau8 was the latest available build.

Localization of predicted CTCF sites

We found predicted CTCF sites using a server at the University of Tennessee Health Science Center [138]. We chose that server because previously it correctly predicted CTCF binding sites in the ICR of human H19IGF2 imprinted domain [139]. The positions of predicted sites agreed with results of ChIPs reporting the association of a subset of nuclear proteins (i.e., CTCF, RAD21, and SMC3) with chromatin [140].