Background

There is a growing interest in examining the role epigenetic marks like histone modifications, non- coding RNAs, and DNA methylation may play as biological mechanisms through which environmental exposures and other physiological and lifestyle factors can lead to disease. Unlike genetics, epigenetic modifications are dynamic and can change over time or in response to exposures. Furthermore, host factors such as sex and age also contribute to inter-individual differences in epigenetic markers.

Previous studies of DNA methylation using the Illumina 27 K BeadChip methylation array have reported autosomal differentially methylated positions (DMPs) or CpG sites with varying methylation between males and females, providing evidence that it will be important to adjust for sex in analysis of methylation data [16]. However, these studies did not account for the existence of non-specific probes for autosomal CpGs that cross react with CpGs on sex chromosomes, thereby yielding false positives [7]. Recently, McCarthy et al. published a meta-analysis of 76 studies all using the 27 K BeadChip array to identify sex-associated autosomal DMPs across specimens from multiple tissue types from adults and children [8]. After excluding the sex-biased cross-reactive probes, they identified 184 DMPs that were associated with sex.

While McCarthy et al. identified several interesting autosomal DMPs, their study focused on methylation assessed by the 27 K BeadChip. In 2011, Illumina released a new version of their methylation array, the 450 K BeadChip, which greatly expanded the number of CpGs interrogated to over 480,000 sites. Further, their approach was restricted to identification of individual DMPs rather than differentially methylated regions (DMRs). DMR-finding approaches have several advantages over considering CpG sites individually, including decreased likelihood of hits from technical artifacts and possibly improved functional impact of results.

As methylation is cell-type specific and immune cell profiles have been shown to vary between sexes, consideration of cell composition is of utmost importance in methylation studies [9, 10]. Yet previous studies of sex-associated differences in methylation [16] haven’t taken this into account in their analyses. White blood cell composition can be estimated from 450 K BeadChip data computationally in adults [11, 12], but these estimates are not appropriate for use for young children in their current implementation [13]. As an alternative, differential cell count (DCC) can be employed to effectively determine such cell type proportions (% lymphocytes, monocytes, neutrophils, eosinophils, and basophils) in cord blood samples.

Here, we use the 450 K BeadChip to assess sex differences in DNA methylation from umbilical cord blood from boys and girls participating in a large epidemiologic cohort followed by the Center for the Health Assessment of Mothers and Children of Salinas (CHAMACOS) study. We use DCCs to account for white blood cell composition. In addition to interrogating DMPs, we apply the newly released ‘DMRcate’ methodology [14] to identify sex-associated DMRs in newborns.

Methods

Study population

The CHAMACOS study is a longitudinal birth cohort study of the effects of exposure to pesticides and environmental chemicals on the health and development of Mexican-American children living in the agricultural region of Salinas Valley, CA. Detailed description of the CHAMACOS cohort has previously been published [15, 16]. Briefly, 601 pregnant women were enrolled in 1999–2000 at community clinics and 527 liveborn singletons were born. Follow up visits occurred at regular intervals throughout childhood. For this analysis, we include the subset of subjects that had both 450 K BeadChip data and differential cell count analysis available at birth (n = 111). Mothers retained in the study subset had a mean age of 25.8 years (±5.1 SD) at time of delivery. Study protocols were approved by the University of California, Berkeley Committee for Protection of Human Subjects. Written informed consent was obtained from all mothers.

Blood collection and processing

Cord blood was collected and stored in both heparin coated BD vacutainers (Becton, Dickinson and Company, Franklin Lakes, NJ) and vacutainers without anticoagulant at the same time. Blood clots from anticoagulant-free vacutainers were stored at −80 °C and used for isolation of DNA for DNA methylation analysis. Heparinized cord blood was used to prepare whole blood slides using the push-wedge blood smearing technique [17] and stored at −20 °C until staining for differential white blood cell count.

DNA preparation

DNA isolation was performed using QIAamp DNA Blood Maxi Kits (Qiagen, Valencia, CA) according to manufacturer’s protocol with small, previously described modifications [18]. Following isolation, all samples were checked for DNA quality and quantity by Nanodrop 2000 Spectrophotometer (Thermo Scientific, Waltham, MA). Those with good quality (260/280 ratio exceeding 1.8) were normalized to a concentration of 50 ng/ul.

450 K BeadChip DNA methylation analysis

DNA samples were bisulfite converted using Zymo Bisulfite Conversion Kits (Zymo Research, Irvine, CA), whole genome amplified, enzymatically fragmented, purified, and applied to Illumina Infinium HumanMethylation450 BeadChips (Illumina, San Diego, CA) according to manufacturer protocol. Locations of samples from boys and girls were randomly assigned across assay wells, chips and plates to prevent any batch bias. 450 K BeadChips were handled by robotics and analyzed using the Illumina Hi-Scan system. DNA methylation was measured at 485,512 CpG sites.

Probe signal intensities were extracted by Illumina GenomeStudio software (version XXV2011.1, Methylation Module 1.9) methylation module and back subtracted. Systematic QA/QC was performed, including assessment of assay repeatability, batch effects using 38 technical replicates, and data quality established as previously described [19]. Samples were retained only if 95 % of sites assayed had detection P > 0.01. Color channel bias, batch effects and difference in Infinium chemistry were minimized by application of All Sample Mean Normalization (ASMN) algorithm [19], followed by Beta Mixture Quantile (BMIQ) normalization [20]. Sites with annotated probe SNPs and with common SNPs (minor allele frequency >5 %) within 50 bp of the target identified in the MXL (Mexican ancestry in Los Angeles, California) HapMap population were excluded from analysis (n = 49,748). Probes where 95 % of samples had detection P > 0.01 were also dropped (n = 460). Since our analysis was focused on CpG sites associated with sex, we excluded sites on the Y chromosome (n = 95) and X-chromosome cross-reactive probes (n = 29,233) identified by Chen and colleagues [7]. Remaining CpGs included 410,072 sites for analysis of sex. Methylation values at all sites were logit transformed to the M-value scale to better comply with modeling assumption [21].

Differential cell counts

Whole blood smear slides were stained utilizing a DiffQuik® staining kit, a modern commercial variant of the Romanovsky stain, a histological stain used to differentiate cells on a variety of smears and aspirates. This staining highlights cytoplasmic details and neurosecretory granules, which are utilized to characterize the differential white blood count. The staining kit is composed of a fixative (3:1 methanol: acetic acid solution), eosinophilic dye (xanthene dye), basophilic dye (dimethylene blue dye) and wash (deionized water). For consistency and to ensure the best results the slides were all fixed for 15 min at 23 °C (room temperature), stained in both the basophilic dye and eosinophilic dye for 5 s each and washed after each staining period to prevent the corruption of the dye.

Slides were scored for white blood cell type composition by Zeiss Axioplan light microscope with 100× oil immersion lens. Scoring was conducted at the perceived highest density of white blood cells using the standard battlement track scan method, which covers the entire width of a slide examination area. Counts for each of the five identifiable cell types (lymphocytes, monocytes, neutrophils, eosinophils, and basophils) were recorded by a dedicated mechanical counter. At least 100 cells were scored for each slide following validation of reproducibility by the repeated scoring of 5 sets of 100 cells from the same slide (CV ≤ 5 %).

DMP analysis

Association between sex at birth and differential 450 K DNA methylation at individual CpGs was performed by linear regression, adjusting for DCC variables and analysis batch. This analysis was performed using R statistical computing software (v3.1.0) [22]. Although DCC estimates were not significantly associated with sex, we chose to include them in the model because likelihood ratio tests showed that including them improved model fit for more than 2000 of the CpG sites assessed by 450 K BeadChip. We also examined gestational age and subject birthweight as possible covariates since both have been shown to be associated with DNA methylation [23], and performed sensitivity analysis to assess their potential impact. However, neither was associated with child sex or contributed to improved model fit.

P-values were corrected for multiple testing using a Benjamini-Hochberg (BH) FDR threshold of 0.05 [24].

Enrichment of annotated genomic features

Comparison of sex-DMP results to annotated function categories, including relation to genes(TSS1500, TSS200, 5′UTR, 1stExon, Body, 3′UTR, Intergenic) and CpG islands (Island, Shore, Shelf, Open Sea), was performed using UCSC Genome Browser annotations supplied by Illumina. A χ2 test of independence with 1° of freedom was used to determine whether there was evidence of enrichment among DMP results (P value < 0.05).

DMR analysis

Identification of sex-associated DMRs was performed using the method described by Peters et al. [14] and implemented in the DMRcate Bioconductor R-package [25]. The approach begins by fitting a standard limma linear model to all CpG sites in parallel [26]. This model was parameterized identically to the DMP analysis with sex as the binary predictor of interest, adjusting for DCC variables and analysis batch. The CpG site test statistics were then smoothed by chromosome according to the DMRcate defaults, which employs a Gaussian kernel smoother with bandwidth λ = 1000 base pairs (bp) and scaling factor C = 2. The resulting kernel-weighted local model fit statistics were compared to modeled values using the method of Satterthwaite [27] to produce p-values that are adjusted for multiple testing using a BH FDR threshold of 0.05 [24]. Regions or DMRs were assigned by grouping FDR significant sites that are a maximum of λ bp from one another and contain at least two or more CpGs. Under this method, CpGs are collapsed into DMRs without considering the direction of the association with the predictor (i.e. sex). The minimum BH-adjusted p-value within a given DMR is taken as representative of the statistical inference for that region and the maximum fold change in methylation values (here on the M-value scale) summarizes the effect size.

Gene ontology analysis

Gene ontology term enrichment analysis was performed by DAVID [28, 29], WebGestalt (WEB-based Gene SeT AnaLysis) [30], and ConsensusPathDB [31], using hypergeometric distribution to assess enrichment significance. Visualization of results and GO term categorization by semantic similarity dimension reduction was performed by REVIGO [32].

Results

Sex-associated differentially methylated positions in newborns

Analysis of DNA methylation differences between newborn boys and girls was performed by linear regression for 450 K BeadChip CpGs among subjects with DCC measurements (n = 111; 58 girls and 53 boys), adjusting for cell composition and batch (Table 1). After data cleaning, n = 410,072 CpGs were analyzed, which excluded sites previously reported to exhibit sex-chromosome specific cross-reactivity [7]. Resulting p-values were plotted by chromosome, with sites having higher methylation levels in girls compared to boys plotted above the x-axis and those with lower levels plotted below (Fig. 1). After adjustment for multiple testing (FDR p < 0.05), we identified 11,776 CpGs that differed significantly by sex in newborns (Table 2). Of those hits, the majority of sites had higher methylation in girls compared to boys (69.0 %). This trend was consistent on both the X chromosome (64.3 % of sites higher in girls) and in autosomes (82.8 %). While the majority of hits were found on the X chromosome (74.3 %), a substantial number were also identified on autosomes (3031 or 25.7 %; Table 2).

Table 1 Demographic characteristics of newborn CHAMACOS subjects, N = 111
Fig. 1
figure 1

Manhattan plot for association between child sex and DNA methylation at all 450 K CpGs, adjusting for batch and cell composition by differential cell count (DCC). Associations where methylation was higher for girls relative to boys are plotted above the x-axis, while those with decreased methylation are plotted below. CpGs meeting FDR multiple testing threshold of (P < 0.05) shown in red

Table 2 Summary of sex-associated DMPs

As differential hypermethylation is to be expected for girls due to X-inactivation [3335], we focused characterization of results on autosomal sites showing sex differences (Table 3 and Additional file 1). Most of these were located in CpG shores, islands and open sea (40.4, 40.1, and 15.4 %, respectively) (Fig. 2 and Table 4). In comparison, shelf regions had the lowest percentage of hits (4.1 %). To assess whether the overrepresentation of hits in CpG islands and shores was due to the design of the 450 K BeadChip, we compared the number of hits in each functional category with the number of CpG sites included in the assay. Both shores and CpG islands were significantly overrepresented among all autosomal hits compared to the 450 K background (χ 2 = 486.1, P < 0.01 and χ 2 = 95.5, P < 0.01), while shelves and the open sea hits were underrepresented (each with P < 0.01). For CpG sites that were hypermethylated in girls compared to boys, we also observed overrepresentation in CpG islands and shores, and underrepresentation in shelf and open-sea locations (all P < 0.01). Sites that were hypomethylated in girls compared to boys were underrepresented in the open sea (30.3 %, P < 0.01) and shelves (5.6 %, P < 0.01). Hypomethylated sites were enriched at islands (χ 2 = 6.53, P = 0.01), but did not deviate significantly from the 450 K representation of shores (χ 2 = 3.42, P = 0.06).

Table 3 Results for the top 30 gene-annotated autosomal DMPs associated with sex in CHAMACOS newborns
Fig. 2
figure 2

Percent of 450 K CpGs (purple), and percent of all (blue), hypermethylated (dark green), and hypomethylated (light green) autosomal differentially methylated positions (DMPs) associated with sex (a & b). These percentages are given by island functional categories (island, shore, shelf, and open sea) in a, and gene functional categories (within 1500 bp of a transcription start site (TSS), 200 bp of a TSS, a 5′ untranslated region (UTR), first exon, gene body, 3′UTR, and intergenic) in b. * indicates that the proportion of sites significantly altered compared to the coverage on the 450 K BeadChip (P < 0.05)

Table 4 DMPs by gene and CpG island annotation

The 11,776 CpG hits differentially methylated between newborn boys and girls were found in 2250 unique genes, and 1430 (63.6 %) of these genes were located on autosomes. Many genes contained multiple significant sites, with an average of 4.7 CpGs per gene and a maximum of 114 CpGs. However, the largest portion of sex-associated autosomal hits (30.4 %) was located in intergenic regions and seen at lower than expected frequency in gene bodies (P < 0.01)(Fig. 2). Near gene transcription starting points (TSS200, 5′UTR, and first exons), all categories were either lower than 450 K CpG design frequencies or did not deviate from them significantly. Further upstream (TS1500), hits that were hypermethylated in girls were significantly enriched (χ 2 = 108.5, P < 0.01) while those showing decreased methylation were underepresented (χ 2 = 13.3, P < 0.01). At the end of genes (3′UTR), hits that had higher methylation for girls were underrepresented (2.4 %, P < 0.01), while hits having higher methylation for boys did not deviate from expected 450 K frequencies (3.6 %, p = 0.97).

Examining the autosomal genes containing sex-associated DMPs for enrichment of particular gene ontology (GO) terms identified 278 pathways that were significantly enriched (FDR P < 0.05 and at least 5 genes per GO term) (Table 5). These enriched GO terms fell into several broad categories including: 1) nervous system development, 2) behavior, 3) cellular development processes, and 4) cellular signaling and motility (Additional file 2).

Table 5 The top 30 differentially enriched gene ontology pathways among hits for sex in autosomal CpGs

Sex-associated differentially methylated regions in newborns

Additionally, identification of groups of CpGs with 450 K BeadChip methylation differences between newborn boys and girls was performed using the DMR-finding algorithm DMRcate [14, 25]. This approach identifies and ranks DMRs by Gaussian kernel smoothing of results from linear models for individual CpGs that were adjusted for cell composition and array batch (see Methods for details). A total of 3604 DMRs were significantly associated with sex in newborns after correcting for multiple testing (FDR p < 0.05; Table 6 and Additional files 3 and 4). These spanned 2608 genes and contained a total of 22,402 unique CpGs. The number of sites within the DMRs ranged from 2 to 99 CpGs, with 50 % of DMRs containing 5 or more CpGs and 25 % having 8 or more. Further, DMR length averaged 863.8 bp, and ranged from 3 to 16.5 kb. Figure 3 shows the DNA methylation levels for boys and girls at two example top DMRs. Figure 3a shows 7 CpG sites in a DMR that had higher methylation for girls in a region spanning the PPP1R3G transcription factor on chromosome 6. While Fig. 3b shows a 8 CpGs from a DMR with lower methylation among girls in the promoter of PIWIL1, which is an important gene for stem cell proliferation and inhibition of transposon migration [36, 37].

Table 6 Results for the top 30 gene-annotated autosomal DMRs associated with sex in CHAMACOS newborns
Fig. 3
figure 3

DNA methylation (β values) for CpG sites included in two top DMRs associated with child sex in newborns. One DMR (a) contains 7 CpG sites, is located on chromosome 6 and spans a 1763 bp region in the exon of PPP1R3G (chr6:5085986–5087749). The other (b) on chromosome 12 includes 8 CpGs over a 1365 bp region across the promoter and 1st exon of PIWIL1 (chr12:130821453–130822818). Girls are shown with red circles, boys with blue triangles, and median methylation per CpG by sex is shown by red and blue lines. Green lines show the genomic coordinates of exon regions for each gene shown

As with DMPs, the majority of sex-associated DMRs had higher methylation in girls compared to boys (75.8 %; Additional file 3: Table S1). This was true for both autosomes and sex chromosomes when considered individually, with 83.8 and 58.5 % of DMRs having higher methylation in girls, respectively. However, a greater total number of DMRs identified were located on autosomes (2471 or 68.6 %) compared to the X chromosome. Similarly, the 70.3 % of the genes covered by sex-associated DMRs were located on autosomes. Further, while the DMRcate method does not constrain all CpGs within a DMR to have the same direction of association with the predictor of interest, we found that the majority of DMRs had 100 % concordance across CpGs in the direction of effect with sex (Additional file 5).

Comparison of the individual site results (DMPs) with the DMR findings revealed that of the 11,776 CpG sites associated with sex in the DMP analysis, 9, 941 (84.4 %) were also included in a DMR. On autosomes, DMRs included 83.2 % of sites found as sex-associated DMPs. Conversely, the DMRs added 12,461 total sites (11,719 on autosomes) that had not been found by DMP analysis alone.

Discussion

Here, we assessed methylation sex differences in newborns as determined by 450 K BeadChip. Using reliable DCC estimates, our results are the first reported EWAS analysis by sex at birth that adjusted for confounding by cell composition. To our knowledge, we are also the first study to assess regions of differential methylation associated with sex in addition to considering all CpG sites individually. We identified a large numbers of X-chromosome CpG sites with higher methylation in girls, which is most likely attributable to X-inactivation [33, 38]. Interestingly, we further demonstrated that a substantial number of autosomal sites and regions also appear hypermethylated in females (Fig. 1 and Table 2).

To assess the consistency of our findings with those of prior analyses, autosomal CpG sites identified as differentially methylated by sex in the current analysis were compared to hits from the three most similar published studies to date (Table 7) [8, 39, 40]. These studies differed from ours either in DNA methylation analysis platform (27 K in McCarthy et al. [18]) or in tissue type used (Xu et al. [39] in human prefrontal cortex and Hall et al. [40] in pancreatic isolates). Although the meta-analysis performed by McCarthy et al. included some studies in umbilical cord blood, most of the studies were performed in adult tissues. Each study found between 184 and 614 autosomal CpG sites that were differentially methylated in association with sex (total of n = 1192 unique sites across all three studies). Our results replicated 428 (35.9 %) of all hits, and 29.4–42.4 % by different studies. Further, among replicated sites we observed 98.5–100 % concordance in the direction of methylation differences. While there was substantial overlap between our autosomal sex-associated hits and these previously published results, 2603 or 85.9 % of our results are novel findings, some of which may be specific to the time point and tissue assessed (umbilical cord blood). Our larger number of hits is likely due to the increased coverage of the 450 K BeadChip. In fact, when considered as a percentage of the number of sites analyzed, we observed a comparable portion of autosomal hits to that found by McCarthy and colleagues using the 27 K platform (0.74 and 0.68 % respectively; P = 0.25).

Table 7 Comparison of CHAMACOS autosomal sex-associated CpG sites (n = 3031) with other published studies

Importantly, the autosomal methylation increases we observed were most concentrated in CpG islands and shores (Fig. 2a). As this trend was not evaluated in past studies, it should be explored and confirmed in additional datasets. Further, our findings that neurodevelopmental ontology terms were strongly enriched among our autosomal findings suggests that DNA methylation may contribute to differences in cognitive processes early in life. This is consistent with sex differences in brain development and rates of maturation that have previously been observed by magnetic resonance imaging in slightly older children (6–17 years of age) [41] and represent a possible regulatory mechanism requiring additional investigation.

Our autosomal hits included several genes already known to exhibit sex-specific functions. These included the male fertility and spermatogenesis related genes identified by McCarthy and colleagues (DDX43, NUPL1, CRISP2, FIGNL1, SPESP1 and SLC9A2). One of our top hits showing increased methylation for girls (Table 3) included SLC6A4, Solute Carrier Family 6, that is involved in presynaptic reuptake of norepinephrine and has been implicated in several neurological disorders with sex-differences in prevalence [4244]. Similarly, we observed novel sex differences in the SHANK2 and SHANK3 scaffolding protein genes that have been associated with autism spectrum disorders (Tables 3 and 6, Additional file 1) [45, 46]. Further, our hits included the homeobox containing transcription factor EMX2, Empty Spiracles Homeobox2, that is required for sexual differentiation and gonadal development [47] and we found to be hypermethylated among girls (Additional file 1).

The DMR analysis confirmed several trends observed by analyzing CpGs individually. In particular, DMR results again showed that girls tend to exhibit hypermethylation compared to boys. Also, many CpGs found to be autosomal DMPs were separately identified as being located within sex-associated DMRs. Besides confirming many of the findings in the DMP analysis, the application of DMR-finding substantially expanded the number of CpG sites considered significant. These results demonstrate that considering methylation over regions rather than single CpG sites may be a more effective way to identify differentially methylated sites and genes of interest.

Conclusions

We confirmed and expanded previously identified trends in autosomal and X-chromosome methylation sex differences during a previously unstudied window in child development, immediately after birth, likely critical in establishing long term health. This strategy to assess epigenetic perturbation as near as possible to the prenatal period remains a high priority in light of the fetal origins of human disease hypothesis [4851].