Background

Children born following assisted reproductive technology (ART) treatments are thought to have a higher prevalence of imprinting defects [1]. One potential origin of such epimutations may lie in the oocyte and embryo culture, which are commonly part of ART procedures [2]. Apart from that, a number of studies have shown that male infertility itself is associated with aberrant DNA methylation profiles, particularly of imprinted genes [3,4,5,6], suggesting that ART may facilitate the transmission of imprinting errors in sperm cells to the next generation. This latter aspect is still, notably, a matter of much debate [7].

Imprinting defects can originate at the different phases of DNA methylation erasure and establishment, occurring during the development of the germline. Sperm originates from primordial germ cells (PGCs). These cells are specified early during embryo development and undergo almost complete erasure of DNA methylation, which allows the establishment of male germline-specific DNA methylation profiles during later stages of gametogenesis [8]. The erasure of DNA methylation in the PGCs takes place in two sequential stages. During the initial stage, a global decrease in methylated cytosines occurs, whereas in the second stage, methylation is removed from imprinting control regions (ICRs) and meiotic genes [9]. These phases of methylation erasure result in an epigenetic ground state with methylation levels in PGCs as low as 7–8% at week 11 of human foetal development. The process of de novo methylation was found to be re-initiated in PGCs from 19-week-old human foetuses [10]. Primate data suggests that this process continues well after birth in germ cells, which are then termed spermatogonia, and appears to be completed only during puberty [11]. Errors in the process of methylation erasure or re-establishment in a proportion of the PGCs were considered as a possible explanation for subpopulations of sperm displaying aberrant methylation levels in the adult [12, 13]. This explanation is conceivable as those few specified PGCs undergo proliferation and give rise to the population of spermatogonia, which colonise the seminiferous cords of the testes. Apart from the ability to self-renew, spermatogonia can also give rise to differentiating daughter cells through entering spermatogenesis upon puberty. This differentiation process is based on the development of spermatogonial clones, which can result in the formation of 16 sperm cells in humans [14, 15]. Incorrect erasure or re-establishment of methylation patterns in individual PGCs could therefore lead to a population of spermatogonia giving rise, via clonal divisions, to a subpopulation of sperm with aberrant methylation profiles.

To address the presence of imprinting errors in sperm, a number of studies have assessed the methylation status of the maternally imprinted gene MEST and the paternally imprinted gene H19 in fertile and infertile men [16]. A meta-analysis suggested a 9.91-fold higher risk ratio for aberrant methylation in the differentially methylated region of H19 for infertile men. In contrast, no increased risk ratio was found for MEST [16].

Careful examination of individual studies suggests four general subgroups of patients, based on the methylation status of H19 and MEST: (1) men with normal MEST and H19 methylation; (2) men with abnormal MEST methylation; (3) men with abnormal H19 methylation and (4) men with impaired methylation patterns in both MEST and H19 [5, 17]. Employing deep bisulfite sequencing, which provides single-molecule resolution, a proportion of sperm in oligozoospermic men was found aberrant in four analysed imprinted genes (H19, MEG3 and MEST, KCNQ1OT1) suggesting the presence of epigenetic mosaicism in these samples, whereas normozoospermic samples presented as an epigenetically homogenous population [13].

In addition to target gene approaches, individual studies have employed methylation arrays to assess methylation changes at selected CpGs (up to 450,000) that may be present in sperm from infertile men (see for example Laqqan et al. [18] and Urdinguio et al. [6]). Interestingly, these studies did not report alterations in the imprinted genes H19 and MEST but did identify CpG sites associated with 48 imprinted genes displaying aberrant methylation [6]. Apart from that, a number of additional CpG sites throughout the genome, not associated with ICRs, showed aberrant DNA methylation patterns [6, 18].

As previous studies were largely focused on the analysis of a few imprinted genes and a small fraction of genomic CpG sites, we set out to analyse the genome-wide DNA methylation patterns of human sperm in normozoospermic and oligozoospermic men. For this, we used a combination of whole-genome bisulfite sequencing (WGBS), which provides information on the methylation status of nearly all of the 28,000,000 human CpGs sites, and targeted deep bisulfite sequencing (DBS).

Results

Screening of H19 and MEST methylation levels in swim-up purified sperm DNA from 93 oligozoospermic and 40 normozoospermic patients

In order to select patients for whole-genome bisulfite sequencing (WGBS) analysis, we measured H19 (CTCF6 region) and MEST methylation levels by deep bisulfite sequencing (DBS) of swim-up purified sperm DNA in an age-matched cohort of 40 normozoospermic (Normal) and 93 oligozoospermic patients (Fig. 1ab, Table 1, Additional file 1: Tables S1 and S2, Additional file 2: Fig. S1A). A principal component analysis (PCA) of H19 and MEST methylation values showed that some oligozoospermic men clearly deviated from the remaining samples (Additional file 2: Fig. S2). Since the first principal component (PC1) explains most of the variability of the samples, we considered samples with PC1 score below the 95th percentile (0.04) as normally methylated and with PC1 score above the 95th percentile as abnormally methylated (Additional file 1: Table S2). According to this threshold, we subdivided the patients into normally methylated normal controls (NC, n = 40), abnormally methylated oligozoospermic (AMO, n = 7) and normally methylated oligozoospermic men (NMO, n = 86) (Fig. 1c, Additional file 2: Fig. S2).

Fig. 1
figure 1

Sperm samples selection for WGBS and establishment of groups based on MEST and H19 methylation. (a) Schematic representation of the experimental design. (b) Dot plot representing the mean methylation levels of MEST and H19 measured by deep bisulfite sequencing in 40 normal (teal) and 93 oligozoospermic (black) sperm samples. At the margins, two density plots show the distribution of the MEST and H19 mean methylation values in the normal and oligozoospermic cohort of samples (Additional file 1: Table S2). (c) Example of deep bisulfite sequencing results of MEST and H19 in the three groups: normal control (NC), abnormally methylated oligozoospermic (AMO) and normally methylated oligozoospermic (NMO). Each horizontal line of a plot represents a unique sequence read, while each vertical position represents a CpG site (methylated sites in red, unmethylated sites in blue). (d) Mean methylation values for MEST and H19 in the five NC (teal), five AMO (orange) and six NMO (purple) selected for the WGBS

Table 1 Clinical parameters of the included patient samples

Whole methylome analysis of swim-up sperm DNA from patients with normal and impaired spermatogenesis

For WGBS, we chose the five AMO samples showing the most aberrant methylation levels of MEST and H19, and randomly selected five NC and six NMO samples from those used in the screening (Fig. 1d). No significant difference in age was found between the three groups (Additional file 2: Fig. S1B). Following the recommendations by Ziller et al. [19], we sequenced the samples at 13–16× coverage (Additional file 1: Table S3). We observed a significant correlation between the methylation values measured for each sample by DBS and WGBS at the same genomic coordinates of H19 (r2 = 0.84, p = 5.7 × 10-7) and MEST (r2 = 0.71, p = 4.6 × 10−5). For comparative analyses, we used previously generated WGBS data of isogenic blood and sperm samples of 12 normozoospermic men (two pools of six individuals each) [20].

Evaluation of whole methylome data for the 50 known imprinting control regions

In order to determine whether, in addition to MEST and H19, other imprinted loci were also affected by aberrant methylation, we analysed the WGBS methylation values for the 50 known maternally and paternally methylated ICRs [21]. We found that the five AMO samples had abnormal methylation levels at all ICRs and that the degree of aberrant methylation at these regions was highly correlated within each sample (Fig. 2, Additional file 2: Fig. S3 and Additional file 1: Table S4). In contrast, the ICR methylation values for the six NMO samples were similar to the observed values in NC samples. Moreover, a PCA of the 20 methylomes revealed that the AMO samples span across the PC1 axis, while NC and NMO samples group together and in the opposite extreme compared with the blood samples (Additional file 2: Fig. S4).

Fig. 2
figure 2

Methylation levels of the oocyte genomic imprints. Box plots showing the distribution of 34 oocyte DMRs methylation values in blood and sperm DNA (Additional file 1: Table S4). Datasets from Laurentino et al. [20] appear in white (BL1 and BL2—blood, SP1 and SP2—sperm), NC sperm samples in teal, AMO sperm samples in orange and NMO sperm in purple. Box plot elements are defined as follows: center line: median; box limits: upper and lower quartiles; whiskers: 1.5× interquartile range; points: outliers

Inventory of differentially methylated regions between sperm and blood derived somatic cells

To investigate whether the aberrant methylation levels in the AMO group reflect epigenetic germline mosaicism or the presence of previously undetected somatic DNA, we made an inventory of soma-germ cell-specific methylation differences. For this, we compared published WGBS data of isogenic blood and sperm samples of 12 normozoospermic men [20] with two different bioinformatic tools (camel and metilene) to identify methylation differences. By defining a differentially methylated region (DMR) as a region of at least 10 CpGs with a methylation difference of at least 80% and a minimum coverage of five reads, we detected 32,686 DMRs, of which 6159 overlap the promoter of 5892 genes (Fig. 3a). Of these genes, 2462 were among the 8175 genes previously shown to be expressed in germ cells and not in testicular somatic cells [22] and which are putatively regulated by DNA methylation of 2764 DMRs (Additional file 1: Table S5). In line with the expression analysis, almost all of these gene promoters were methylated in blood and unmethylated in sperm. Analysis of the methylation levels of the 2764 DMRs revealed that the five AMO samples have aberrant methylation at all soma-sperm specific differentially methylated genes (Fig. 3b, Additional file 2: Fig. S5 and Additional file 1: Table S5). Moreover, in each sample, the degree of aberrant methylation was similar to the levels observed for the imprinted regions (Fig. 2, Additional file 2: Fig. S3 and Additional file 1: Table S4).

Fig. 3
figure 3

Inventory of the sperm-soma DMRs putatively regulating promoters of 2462 testicular germ cell-specific genes. (a) Flow chart of the discovery of 2764 sperm-soma DMRs using blood datasets as somatic representatives. DMRs were required to cover at least 10 CpGs with at least 80% difference in methylation, minimum coverage of 5 reads and a maximum q value of 0.05. (b) Box plots showing the distribution of the methylation values of 2640 DMRs less methylated in sperm than in blood (left) and 121 DMRs more methylated in sperm than in blood (right) (Additional file 1: Table S5). Datasets from Laurentino et al. [20] appear in white (BL1 and BL2—blood, SP1 and SP2—sperm), NC sperm samples in teal, AMO sperm samples in orange and NMO sperm samples in purple. Box plot elements are defined as follows: center line: median; box limits: upper and lower quartiles; whiskers: 1.5× interquartile range; points: outliers

Most recently, Luján et al. claimed to have identified 217 DMRs useful for fertility assessment. In their study, they analysed unpurified sperm samples by methylation-dependent immunoprecipitation (MeDIP-Seq) [23]. We determined the methylation levels of these DMRs in the blood-sperm WGBS dataset [20] and our five NC and six NMO samples (Additional file 1: Table S6). We found that the DMRs are unable to distinguish the sperm of normozoospermic men from the sperm of oligozoospermic men (Additional file 2: Fig. S6). Rather, they discriminate between clean sperm samples and sperm samples containing somatic DNA, as the 50 ICR DMRs and our inventory of 2764 soma-sperm DMRs do, but the latter do so with higher sensitivity (Figs. 2 and 3).

To further validate the findings in our patients, we performed DBS for XIST and DDX4 loci, previously shown to be fully unmethylated in normal sperm [24], on the 40 normal controls and the 93 oligozoospermic patient samples used in the initial screening (Additional file 1: Table S2). We further confirmed that each of the five AMO that were subjected to WGBS showed an aberrant methylation level at these two loci, which was highly correlated with the aberrant methylation in both the imprinted regions and the soma-sperm specific differentially methylated genes. All normal controls and 77 of the normally methylated oligozoospermic were found to have the expected XIST and DDX4 methylation levels (< 6%; Additional file 2: Fig. S7). From the two AMO samples not analysed by WGBS, one (SOAT7) was shown to have DDX4 methylation levels consistent with the presence of somatic cell DNA (Additional file 1: Table S2). The other (SOAT6) showed aberrant methylation levels for H19 CTCF6 but was considered normal for MEST, XIST and DDX4 (Additional file 2: Fig. S8). This sample had a similar pattern in the CTCF4 region of H19, but the fraction of completely unmethylated reads was smaller. We sequenced additional ICRs and compared the DBS methylation levels in this sample (Additional file 2: Fig. S8) with that of a representative NC (VN25, Additional file 2: Fig. S9). SOAT6 has a very small proportion of completely methylated reads in the XIST, KCNQOT1 and PEG10 amplicons which suggests somatic DNA contamination. In summary, we conclude that despite swim-up purification, somatic cell DNA was still present in some NMO and AMO samples and therefore these samples were excluded from further analysis.

Identification of differentially methylated regions in sperm from normal and oligozoospermic men

To identify true DMRs between the sperm of normal and oligozoospermic men, we compared the genome-wide methylomes of six NMO and five NC sperm samples that are devoid of somatic DNA (Fig. 4a). Using two different bioinformatic tools, we identified 103 DMRs with at least five CpGs, a methylation difference of at least 0.3 and a minimum coverage of five reads (Additional file 1: Table S7). Since the genetic background (i.e. DNA polymorphisms) may affect DNA methylation [25], some DMRs may display a higher range of values within a group. Therefore, to reduce the potential influence of the genetic background, we limited the range of methylation values within the normozoospermic group to 0.3, thus keeping 19 of the 103 DMRs (Fig. 4a and Additional file 1: Table S7). Three of the 19 DMRs were hypermethylated in normozoospermic samples, while the remaining 16 were hypermethylated in the NMO patients (Fig. 4b and Additional file 1: Table S7).

Fig. 4
figure 4

Discovery of normal controls vs. normally methylated oligozoospermic sperm DMRs. (a) Flow chart of the discovery of 19 DMRs between NC and NMO sperm groups (left) and the distribution of DMRs according to the range of values in normal controls, with DMRs having NC range < 0.3 highlighted in grey (right). (b) Box plots showing for the 19 DMRs the distribution of the methylation values for NC (teal, n = 5), and NMO (purple, n = 6) (Additional file 1: Table S7). (c) Box plots showing for 17 DMRs the distribution of the methylation values obtained by targeted DBS in an independent cohort. VNC, validation normal control samples (teal, n = 20), VNMO, validation normally methylated oligozoospermic samples (purple, n = 20) (Additional file 1: Table S11). Statistical analyses showed no differences between the two groups (Mann-Whitney U test followed by Bonferroni correction for multiple testing; Additional file 1: Table S12). Box plot elements are defined as follows: center line: median; box limits: upper and lower quartiles; whiskers: 1.5× interquartile range; points: outliers

In order to validate the DMRs in an independent cohort, we established reliable targeted DBS assays for 17 DMRs (Additional file 1: Table S8; specific primers could not be designed for DMR6 and DMR12 due to the presence of highly homologous sequences in the genome). Although the DBS approach targets only DMR CpG subsets (coordinates in Additional file 1: Tables S7 and S8), the distributions of WGBS and WGBS CpG subset methylation values are the same, as measured by Wilcoxon signed-rank test (Additional file 1: Tables S9 and S10). Due to the limited amount of oligozoospermic sperm DNA, we first analysed 20 normal control samples (VNC) and then selected DMRs for further validation in 20 normally methylated oligozoospermic swim-up sperm DNA samples (VNMO). After sequencing the VNC samples for each of the 17 DMRs, we selected 10 DMRs based on the number of VNMO methylation values outside of the normal samples methylation range (Additional file 2: Fig. S10 and Additional file 1: Table S11). Following sequencing each of the 10 selected DMRs in the 20 VNMO samples and comparison with the VNC data, none of the DMRs could be validated (Fig. 4c and Additional file 1: Tables S11 and S12).

Influence of single nucleotide polymorphism (SNPs) on H19 methylation levels

Single CpG sites in the CTCF6 binding site of H19 have previously been shown to be differentially methylated in normal and infertile patients [4, 17, 26,27,28,29,30,31,32]. In order to analyse this further, we performed a PCA, using as loadings the methylation values of the individual 14 CpG sites analysed by DBS in all the individuals showing no presence of somatic cell DNA according to XIST and DDX4 assay results (n = 118, NC = 40, NMO = 77, AMO = 1; Additional file 1: Table S13). This analysis showed that the variation in PC1 was mainly due to the CpG3 methylation levels (Additional file 2: Fig. S11AB). The peculiarity of CpG3 is also visible in the amplikyzer plots (Fig. 1c, Additional file 2: Figs. S8 and S9). CpG3 is in the vicinity of a G/A-SNP (rs10732516; Fig. 5a). Since the genotype of this SNP is masked by bisulfite treatment, we used the nearby rs2071094 SNP, which is in high linkage disequilibrium (r2 = 0.99 and D’ = 1 according to annotations by HaploReg v4.1 [33]) to investigate the possible effects of these SNPs on H19 methylation values. Such an effect has previously been reported in blood and placenta [34, 35]. We observed that individuals clustered in the PCA according to their rs2071094 genotype (TT, TG, GG) (Fig. 5b). GG men showed a significantly lower CpG3 methylation compared to the individuals with TG or TT genotype (Fig. 5c), and TG men showed a significantly lower methylation in the reads corresponding to the G allele compared to the T (Fig. 5d and Additional file 1: Tables S14 and S15). Finally, the subdivision of patients according to the diagnosis (NC or NMO) did not show any significant difference between normal and oligozoospermic patients sharing the same genotype (Fig. 5e). The same was observed when analysing all CpGs in the H19 CTCF6 region as a whole (CpG2-4) (Additional File 2: Fig. S11CD). This shows that the methylation levels of the H19 CTCF6 region as a whole and particularly CpG3 are affected by genetic variation irrespective of the fertility status.

Fig. 5
figure 5

Analysis of the influence of the SNP rs2071094 on the H19 methylation levels. (a) Schematic representation of the H19 CTCF6 locus showing the CpGs analysed by DBS (red and numbers 1–14) and the SNP-masked CpG (red). SNPs in high linkage disequilibrium are shown in green. Numbers on top refer to hg38 coordinates of chromosome 11. (b) Principal component analysis (PCA) of the 14 CpG sites in the H19 locus obtained by DBS for the 40 normal controls (NC), 77 normally methylated oligozoospermic (NMO) and one abnormally methylated oligozoospermic (AMO) colour-coded according to the SNP rs2071094 genotype: T/T black, T/G orange, G/G light blue. (c) Box plot showing the distribution of the CpG3 methylation in the 118 patients subdivided according to the SNP rs2071094 genotype. Statistically significant differences are denoted by letters: a—TG different from TT, b—GG different from TG and TT. P values are denoted by the number of letters, e.g. aaa p < 0.001 (Wilcoxon rank-sum test; Additional file 1: Tables S13 and S15). (d) Box plot showing the CpG3 methylation in the T versus the G allele of the 49 TG patients (Additional file 1: Table S14 and S15). aaa p < 0.001 (Mann-Whitney U test). (e) Box plot showing the CpG3 methylation in the 40 normal controls (NC, teal) and 77 normally methylated oligozoospermic (NMO, purple) divided according to the SNP rs2071094 genotype. No significant differences between normal and oligozoospermic patients sharing the same genotype (Wilcoxon rank-sum test; Additional file 1: Tables S13 and S15)

Discussion

Aberrant DNA methylation patterns of imprinted genes have been reported in semen samples from infertile men in a number of studies [16, 36]. While the majority of studies focused on the analysis of selected ICRs, mainly MEST and H19, these reports still differed with regard to the observed differences between normal and infertile men. Specifically, aberrant methylation patterns for only MEST or H19 were described in some patients, whereas others apparently carried a subpopulation of sperm which showed the same degree of aberrant imprinting in multiple imprinted genes (MEST, LIT1, H19, MEG3) and thereby indicated epigenetic mosaicism in sperm from OAT men [13]. Defects in imprint erasure or imprint establishment in the male or female germline are known to cause imprinting diseases such as Prader-Willi or Angelman syndrome in offspring (for review see [37]). The inheritance of the sperm epigenome in other instances is a matter of debate [38, 39]. For these reasons, it was a clinical necessity to assess the frequency and extent of epimutations, not only in selected genes but also in the entire genome of oligozoospermic men undergoing assisted reproduction. To this end, this study sought to assess the DNA methylation levels in normal and severely impaired spermatogenesis by whole-genome and ultra-deep bisulfite sequencing.

In the screening process of the 93 samples from patients with severe oligozoospermia, which makes our study one of the largest in its field, only 1% showed aberrant methylation for MEST and also 1% for H19. Five percent of samples appeared to be aberrantly methylated at both imprinted genes, whereas the great majority (93%) showed normal methylation levels for MEST and H19. The presence of these four subgroups and the distribution among them when analysing only MEST and H19 methylation values is in line with previous publications [5, 17, 38, 39], although percentages of samples with aberrant profiles were generally higher (e.g. 57% in Poplinski et al. [5]).

While in some studies, all of the analysed CpG sites within the CTCF6 region of H19 were either methylated or unmethylated [5], in other studies, the methylation differences were restricted to single CpG sites within this region [4, 17, 26, 27, 29, 31, 32]. We also observed, in both normozoospermic and oligozoospermic samples, a fraction of partially unmethylated reads in our H19 DBS amplicons, which cover a large proportion of the CpGs that had been analysed by Sanger sequencing of subcloned PCR products or by pyrosequencing in the above-mentioned studies. The most variable H19 CpG in our assay is CpG3, which corresponds to CpG4 in Camprubi et al. [27], CpG5 in Boissonnas et al. [26] (mistakenly excluded from their analysis; instead, the authors should have excluded CpG6 for being a CpG-SNP), and CpG6 in other studies [4, 17, 29, 31, 32]. We demonstrate that variation in DNA methylation at this CpG site within the H19 CTCF6 region is correlated with the genotype of a nearby SNP (rs2071094), irrespective of the fertility status, with GG homozygotes having the lowest methylation level and TT homozygotes the highest methylation levels. rs2071094 is in high linkage disequilibrium with CpG-SNP rs10732516 suggesting that the presence or absence of an additional CpG site next to CpG3 could influence the methylation of the latter. These results support the view that DNA methylation patterns are influenced to a large extent by the genetic background [25] and suggest that studies reporting reduced methylation levels of this CpG within the H19 CTCF6 region in oligozoospermic men might have been confounded by a fortuitously higher G allele frequency in cases compared to controls. We identified one individual sample showing an aberrant methylation level of the H19 CTCF6 and CTCF4 regions as well as a very small proportion of completely methylated reads in the XIST, KCNQOT1 and PEG10 amplicons. We are uncertain whether this sample carries a true H19 epimutation, has a rare genetic variant or contains minute amounts of somatic DNA, which show up in some but not in all PCRs.

In this study, we focused on the genome-wide DNA methylation analysis of the two most prominent groups of oligozoospermic samples: those with abnormal methylation of MEST and H19 (AMO) and those with normal methylation levels in both regions (NMO). Unexpectedly, the former group of samples displayed the same level of aberrant methylation not only in H19 and MEST but also in all of the 50 known ICRs as well as in DDX4 and XIST. Moreover, 2764 soma-germ cell-specific DMRs were also aberrantly methylated to the same degree. Since many genes from this list are necessary for meiosis, spermatid development or spermiogenesis, it is highly unlikely that germ cells in which these genes are silenced by promoter methylation would have produced motile spermatozoa. In contrast, the presence of residual somatic cell DNA, shifting the methylation level towards that of somatic cells, appears to be the more plausible explanation.

After the exclusion of samples showing abnormal methylation levels of either DDX4 or XIST, which is consistent with a clear presence of somatic DNA (16% of our oligozoospermic samples), only one sample with aberrant methylation at H19 remained. However, it is unclear whether this sample contains traces of somatic DNA since the proportion of abnormally methylated reads is very small but occurs in three additional amplicons. Nevertheless, the percentage of oligozoospermic patients possibly carrying an imprinting defect in our cohort (0–1%) is much lower than previously reported (as high as 57% in Poplinski et al. [5]). We suspect that other studies also suffer from DNA contamination issues.

The origin of somatic cell DNA in swim-up purified sperm samples remains hitherto unclear. It has been reported that increased numbers of leucocytes are present in the semen of 30% of infertile men, even in the absence of an infection [40]. It appears possible that in these cases, somatic cells or cell fragments that escape quality controls could be amidst the very few sperm that are present in the infertile samples and skew the analyses in the direction of a somatic cell profile. Also, DNA fragments released from apoptotic or necrotic somatic cells may tightly stick to the sperm cells, although there is no evidence for this assumption to date. Our unexpected result highlights the importance of assessing sperm DNA samples for the absence of somatic cell DNA prior to methylation studies. Along this line, pre-screening approaches have been published, which describe multiple sites enabling the distinction of germ cell versus somatic cell-derived DNA [41]. As shown here, the analysis of MEST, H19, XIST and DDX4 loci is sufficient to identify somatic contamination. In our experience, contaminated samples show aberrant methylation in at least two loci. Furthermore, we describe 2764 DMRs that overlap with the promoters of 2462 genes previously shown to be expressed in germ cells and not in testicular somatic cells [22]. A subset of at least four of these DMRs may also be used to assess the purity of a sample. This comprehensive list of DMRs constitutes a valuable resource for future studies seeking to assess the purity of their sperm samples.

It is surprising that so many genes, both protein and non-protein coding genes, appear to be regulated by promoter methylation. Most often, cellular differentiation does not involve promoter methylation, but methylation of distal regulatory elements such as enhancers. Interestingly, most of the 2462 genes are methylated in blood cells and unmethylated in germ cells. This suggests that these genes need to be permanently silenced in somatic cells. Since many of these genes play a role in meiosis, it is tempting to speculate that these genes are permanently silenced in somatic cells to prevent them from interfering with mitosis.

When comparing the genome-wide methylomes of sperm samples from normozoospermic and oligozoospermic patients displaying normal MEST and H19 methylation levels, we did not find any recurrent methylation difference between the two groups. This is in contrast to a recent report in which the authors claim to have identified 217 DMRs between unpurified sperm from nine fertile and 12 infertile men [23]. However, as shown here, the methylation levels at these regions reflect the admixture of somatic DNA and are not biomarkers of infertility.

Our findings show that the DNA methylation patterns of clean sperm are normal, which is reassuring for patients undergoing ART treatment. It is possible that spermatogonia with DNA methylation abnormalities exist, but they likely do not contribute to the mature, swimming sperm population, if the epimutations affect genes involved in meiosis, spermatid development or spermiogenesis. We only considered regions consisting of more than five CpG sites for our analysis, which is in contrast to previous publications performing array analysis and considering individual CpG sites [6, 18]. It should be noted, however, that aberrant methylation restricted to one or a few CpGs of an ICR, if real, is unlikely to be of clinical relevance, because in all patients with an imprinting disease based on imprinting errors, almost all CpGs of an ICR are affected [37, 42].

Conclusions

Our results suggest that the undetected presence of somatic DNA, as well as genetic variation, confound methylation studies in sperm of infertile men. After controlling for these confounders, we have found no evidence for recurrent epimutation in imprinted genes or elsewhere in the genome in sperm of severely oligozoospermic men. While we are aware that WGBS is underpowered to detect rare patients with slightly abnormal sperm methylation levels at non-recurrent CpG sites, we conclude that the prevalence of aberrant methylation in infertile men has likely been overestimated, which is reassuring for patients undergoing ART treatment. In the course of this study, we have also found that a large number of germ cell-specific genes are regulated by promoter methylation. The list of soma-germ cell-specific DMRs can be used for assessing the quality of sperm preparations and for studying the epigenetic regulation of spermatogenesis in more details.

Methods

Sample selection and clinical information

The patients included in this study were selected among those attending the Department of Clinical and Surgical Andrology at the Centre of Reproductive Medicine and Andrology (CeRA, Münster, Germany) for fertility treatment. All the patients underwent full physical evaluation and those with known genetic causes of infertility, chromosomal aberrations, under pharmacological treatment, with a history of cryptorchidism, acute infections and tumours were excluded from the analysis. Blood samples were taken for hormone measurements including gonadotropins and testosterone. Chemiluminescent microparticle immunoassays were performed using the Architect i1000 (Abbott Diagnostics, Wiesbaden, Germany) to measure LH (02P40- 25), FSH (07 K75- 25), T (02P13- 28), SHBG (08 K26- 20), prolactin (07 K76- 25), estradiol (07 K72- 25 ) and PSA (07 K70- 25). DHT was measured using the ACTIVE® Dihydrotestosterone radio-immunoassay (RIA) (DSL-9600 Beckmann-Coulter, Krefeld, Germany), according to the manufacturer’s instructions. The assay shows less than 2% cross-reactivity with T and it is calibrated against a standard of LCMS-MS with an accuracy of < 15% within the range of 0.1 to 5 nmol/l. Intra-Assay CV is 3.5%, mean inter-assay CV is 7%. Moreover, semen analysis was performed according to the WHO manual [43]. In total, 133 individuals were selected and subdivided into two age-matched groups according to the spermiogram results: 40 normal controls (NC) diagnosed as normozoospermic and 93 diagnosed as oligoasthenoteratozoospermic, oligoteratozoospermic or oligozoospermic, which are termed OATs throughout the manuscript (Additional file 1: Table S1).

Swim-up procedure for isolation of motile sperm

The swim-up procedure was used to isolate the motile sperm cells, in line with the preparation of samples for assisted reproductive technology treatment. Briefly, after an incubation period of 30 min at 37 °C, 1–2 ml of ejaculate were mixed with the same amount of sperm preparation medium (10705060, Origio, Denmark), by using a cell culture tested disposable pipette. The mixture was then centrifuged at 390g for 10 min, the supernatant decanted and the remaining drops aspirated. The pellet was washed with 2 ml of medium and centrifuged at 390g for 10 min. After removing the supernatant, 1 ml of medium was carefully added to the pellet in order to not dissolve or wash it off. As a precaution, the tube was briefly centrifuged for 1 min at 390g and then incubated for 60 min at 37 °C and 5% CO2. After 1 h of incubation, 500–700 μl of the uppermost layer were collected and stored in a small cell culture tube. A total of 20 μl of the cell suspension was used to determine the sperm concentration in a Neubauer improved counting chamber (Additional file 1: Table S1). The rest of the volume was further centrifuged for 5 min at 16,060g, the supernatant was discarded and the sperm pellet was stored at −20 °C.

DNA isolation

The DNA isolation was performed on the swim-up purified sperm using the MasterPure DNA purification kit (MC85200, Epicentre Biotechnologies, Madison, WI, USA) as previously described [13]. DNA concentration was measured using a fluorescence plate reader (FLUOstar Omega, BMG Labtech, Germany).

Whole-genome bisulfite sequencing

Sperm WGBS libraries were prepared according to a modified protocol based on the tagmentation-based method described by Wang et al. [44] and further simplified by Souren et al. [45]. Briefly, 10 ng sperm DNA supplemented with 1% unmethylated lambda-DNA (Promega, D152A) were incubated in a 50-μl reaction with 0.8 μl of Tn5 transposase at 1× TD buffer from the Nextera library preparation kit (Illumina, FC-121-1030) for 5 min at 55 °C. Tagged DNA was purified with the DNA Clean & Concentrator-5 kit (Zymo Research, D4013) eluting with 14 μl EB buffer (Qiagen, 19086), followed by gap repair by adding 2 μl of 10× NEBuffer 2 (NEB, B7002S), 3 μl of dNTPs (2.5 mM each) and 5 U Klenow exo- (NEB, M0212S) and incubating for 1 h at 30 °C. Bisulfite conversion was performed using the EZ DNA Methylation-Gold kit (Zymo Research, D5005) according to the manufacturer’s instructions. Indexed-libraries were obtained by enrichment PCR with 1× HotStarTaq Master Mix (Qiagen, 203445), 100 nM of each primer and 10 μl bisulfite-converted DNA in 40 μl reactions (PCR settings: 95 °C 15 min, 12× (95 °C 30 s, 53 °C 2 min, 72 °C 1 min), and 72 °C 7 min). Reactions were purified twice using 0.8× volume AMPure XP Beads (Beckman Coulter, A63881) and eluted in 10 μl EB buffer (Qiagen, 19086). Libraries were sequenced in HiSeq4000 100-bp paired-end runs (Illumina) using one lane per sample.

WGBS data analysis

Raw read data was aligned against reference genome hg38 using bwa-meth [46] (v0.2.0) and deduplicated by Picard’s MarkDuplicates functionality [47] (v2.18.15). Alignments were sorted and indexed using samtools [48] (v1.9). We used MethylDackel [49] (v0.3.0) for subsequent methylation calling. For quality control, we used MultiQC [50] to integrate quality metrics collected by Picard, FastQC [51] (v0.11.8) and Qualimap [52] (v2.2.2b). We chose camel [53] (v0.4.7) and metilene [54] (v.0.2.6) to call DMRs. While camel uses t statistics to identify differentially methylated CpGs, metilene reports FDR-corrected p values for DMRs. Average coverage per DMR was computed using mosdepth [55] (v0.2.3). We used R (v3.4.1) to compute conversion rates based on MethylDackel methylation reports and to perform DMR annotation and filtering. Filtering DMRs was performed in a straightforward fashion using each DMR callers’ output. We filtered DMRs based on the number of CpGs covered, methylation differences between groups, q values reported by metilene and average coverage as computed by mosdepth. For the blood/sperm comparison, we required DMRs to cover at least 10 CpGs with at least 80% difference in methylation, minimum coverage of 5 reads and a maximum q value of 0.05. When comparing NC and OAT samples, we set the thresholds to 5 CpGs, 30% methylation difference, minimum coverage of 5 and a maximum q value of 0.05. After filtering, we merged DMRs using the GenomicRanges R package [56]. Merged DMRs were annotated for overlap with CGIs using data from the UCSC database [57]. Genes and promoters were annotated using information from the Ensembl database [58]. We require genes to be marked as either protein-coding, long non-coding RNA or miRNA. Promoters were defined as the 2000 bp region around TSSs.

Targeted deep bisulfite sequencing

Targeted DBS was performed on the Roche/454 or the Illumina MiSeq platform essentially as described previously [59]. The bisulfite conversion was performed on 100 ng of sperm DNA using the EZ DNA Methylation-Gold kit (Zymo Research, Freiburg, Germany) according to the manufacturer’s protocol. The bisulfite converted DNA was eluted in 10 μl of TE buffer. The primer pairs and PCR conditions are described in the Additional file 1: Table S8. For the H19 amplicon, although it comprises 15 CpGs, only 14 CpGs are shown since the CpG affected by a known polymorphism (rs10732516) was masked in the analyses.

Statistics

Normality and homoscedasticity tests were performed for all variables and difference between groups was assessed by non-parametric tests: Wilcoxon signed-rank test for two dependent groups and Mann-Whitney U test for two independent groups, followed by Bonferroni correction for multiple testing. Kruskal-Wallis rank-sum test was used to compare three or more independent groups, followed by multiple pairwise-comparisons. Statistical analysis and graphs plotting were performed using R 3.5.3 [60] and appropriate R packages, namely stats [60] (v3.5.3), ggplot2 [61] (v3.2.1) and factoextra [62] (v1.0.5).