Introduction

Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease in developed countries [1]. NAFLD pathogenesis is thought to be multifactorial, influenced by lifestyle, diet, and genetics [2] but dominated by elevated central adiposity [3].

Epigenetic processes are key components linking environment, genetics, and metabolic disease risk [4]. Previous epigenome-wide DNA methylation association studies (EWAS) have identified differentially methylated CpG sites (dmCpGs) associated with NAFLD in adults [5, 6]. Adolescents and children have been rarely studied [7], but the influences of early life exposures on the epigenome may be clearer and interventions more impactful during times of physiological plasticity [8].

Adolescents with NAFLD may be at the earliest stages of disease and confounders such as alcohol consumption, type 2 diabetes mellitus and other metabolic diseases are less prevalent than in adults. Rapid physiological changes in adolescence denote a period where dysmetabolism may initiate liver damage [9]. Hence, well-characterized adolescent cohorts with a substantial prevalence of NAFLD provide an opportunity to investigate the association between epigenetic variation and NAFLD.

This study aimed to identify dmCpGs in adolescents with NAFLD. We performed a cross-sectional EWAS from whole-blood in the population-based Raine Study where detailed liver assessment had been undertaken at age 17 [10]. The use of whole-blood is well established in EWAS, as it represents a relatively accessible tissue for biochemical analysis in populations studies. Following EWAS, we validated the most strongly associated dmCpGs using pyrosequencing and examined their relationship with additional measures of liver biochemistry (γ-glutamyl transferase (GGT), alanine aminotransferase (ALT), and aspartate aminotransferase (AST)). Finally, we examined DNA methylation of 22 dmCpGs associated with NAFLD in adults [6].

Methods

The Raine study

The Raine Study is a longitudinal cohort study initiated 1989–1992 in Perth, Western Australia as a cohort of pregnant women (“Gen1”) and their offspring (“Gen2”). The Raine Study Gen2 cohort is representative of the general population of Western Australia, as described in detail elsewhere [11]. The current cross-sectional follow-up study was performed when the cohort had reached approximately age 17 years (Gen2-17); 1170 participants underwent assessment including (i) a detailed health questionnaire; (ii) anthropometric assessment; (iii) abdominal ultrasonography; and (iv) fasting biochemistry.

Steatosis score and NAFLD definition

NAFLD was diagnosed by ultrasound-confirmed hepatic steatosis and a daily alcohol consumption < 10 g for females and < 20 g for males [12]. Ultrasound by trained sonographers used a Siemens Antares ultrasound machine with a CH 6–2 curved array probe (Sequoia, Siemens Medical solutions, Mountain View CA), according to a standardized protocol [13]. A single radiologist interpreted images and scored hepatic steatosis severity based upon echotexture, deep attenuation, and vessel blurring (0–1 no steatosis, 2 mild steatosis, and 3–6 moderate-severe steatosis). The intra-observer reliability (κ statistics) for fatty liver was 0.78 (95% confidence interval [CI] 0.73–0.88). Testing for hepatitis B or C virus infections was not performed because notification rates were on average less than 24/100,000 and 23/100,000, respectively, for Western Australian adolescents aged 15–19 years over the study period [12].

Epigenome-wide DNA methylation profiling

DNA was extracted from blood (Puregene DNA isolation kit; Qiagen, Venlo Netherlands) [14]. Epigenome-wide DNA methylation profiles were undertaken using the Illumina (San Diego, CA) Infinium HumanMethylation 450 BeadChip array (University of British Columbia Centre for Molecular Medicine and Therapeutics; http://www.cmmt.ubc.ca).

Quality control was performed using shinymethyl [15], MethylAID [16] and RnBeads [17] as described previously [18]. Beta-mixture quantile normalization [19] was applied. Technical covariates (plate, slide, well number) were included in all statistical models to adjust for batch effects. Cell counts were estimated using the estimated Houseman method [20] for six cell types (CD8T, CD4T, NK, B cell, monocytes, granulocytes).

Statistical analysis

Univariate analysis

A total of 707 of the original 1,170 Raine Gen2 Age 17 participants who had undergone assessment for NAFLD had complete epigenome and covariate data used for statistical analysis. Univariate comparisons of continuous demographic and biochemical variables with NAFLD status were compared with Student’s t or Welch’s one-way tests if normally distributed, and Kruskal–Wallis or Wilcoxon rank sum tests if skewed. Associations of binary variables with NAFLD were assessed using t-tests for parametric variables and Mann–Whitney U tests for non-parametric variables. Measures of adiposity were BMI, and waist circumference, while liver biochemistry comprised serum γ-GGT, ALT, and AST [12]. Insulin-metabolism measures were fasting glucose and insulin, homeostasis model assessment of insulin resistance (HOMA-IR). Serum high-sensitivity C-reactive protein (hsCRP), leptin and adiponectin were measured [12].

epigenome-wide DNA methylation association analysis

For EWAS with ultrasound liver steatosis scores, we used linear mixed effects models. Four models were analysed for internal validation: (i) Model 1 adjusted for CpG, age, sex, white blood cell count, principal components derived from genome-wide genotype data, and technical covariates with plate number representing the random effect in the model; (ii) Model 2 included variables from model 1 and Houseman cell count estimates; (iii) Model 3 used all model 2 estimates without principal components; and, (iv) Model 4 included model 1 covariates with assayed white blood counts (red blood cell, neutrophils, lymphocytes, eosinophils, basophils.)

Overlap with adult CpGs identified in NAFLD meta-analysis

We investigated 22 dmCpGs previously associated with liver fat accumulation in adults [6]. A Bonferroni correction of p value < 0.05/22 = 2.3 × 10–3 was used to define statistical significance as we are hypothesis testing if the dmCpGs demonstrate signal at an earlier age.

Pyrosequencing validation

Inclusion criteria for CpG pyrosequencing were genes represented by 2 or more dmCpGs that were within the top 100 most significantly associated CpGs in statistical model 3 and that were significant across all four statistical models at p < 0.007. Four dmCpGs (cg01572694 MIR10A, cg05821571 PTPRN2, cg19537719 ANK1, cg27650870 ANK1) in three genes passed these criteria. Sodium bisulphite pyrosequencing was carried out on whole blood DNA samples at age 17 as described [21] (Supplementary Table 1). Pyrosequencing was carried out using PCR products (10 μl) to measure DNA methylation (%) of sixteen dmCpGs (Pyro-Q-CpG 1.0.9 software, Supplementary Table 2) across the three genes of interest (ANK1, MiR10A, PTPRN2). Agreement between methylation from pyrosequencing and EWAS arrays was assessed by Bland–Altman plots for four dmCpGs (cg01572694 MIR10A, cg05821571 PTPRN2, cg19537719 ANK1, cg27650870 ANK1), one-sample t-test of the difference and linear regression between mean methylation (independent variable) and difference in methylation (dependent variable).

Pyrosequencing association analysis

Three statistical models assessed association of DNA methylation of dmCpGs determined by pyrosequencing with steatosis score or NAFLD: (1) model 1 accounted for age and sex; (2) model 2 accounted for age, sex, and five Houseman cell count covariates (CD4T, CD8T, B cell, NK, and monocytes). Granulocytes were removed due to high collinearity with steatosis score and NAFLD [22]; (3) model 3 investigated whether the associated CpG was also influenced by adiposity and included waist circumference as a covariate. These dmCpGs were also investigated if associated with three markers of liver biochemistry (GGT, ALT, AST).

All analyses were performed using the statistical package R, version 3.0 or above.

Results

Raine Study Gen2-17: NAFLD phenotype prevalence and characteristics

Table 1 summarizes characteristics of the 707 adolescents who had liver assessments, genome, and epigenome-wide profiling at the Raine Study Gen2-17 follow-up. Overall prevalence of NAFLD was 14.5%, with a higher prevalence in females than males (17.4% vs 11.8%, p value = 0.02). NAFLD was associated with higher adiposity, HOMA-IR, serum ALT, GGT and hsCRP, and with lower adiponectin (Table 1).

Table 1 Demographic, anthropometric and biochemical phenotypes of NAFLD and non-NAFLD adolescent Raine Study Gen2-17 participants

Epigenome-wide DNA methylation association with adolescent NAFLD

DNA samples from 707 (52.1% males) adolescents who had undergone ultrasound assessment for NAFLD were analysed for EWAS. Our criteria identified eight dmCpGs in three genes: three dmCpGs (cg19537719, cg27650870, and cg18614735) in ankyrin-1 (ANK1, chromosome 8p11), three dmCpGs (cg04514255, cg01572694, and cg15649236) in microRNA 10a (MIR10A, chromosome 17q21), and two dmCpGs (cg22676516 and cg05821571) in protein tyrosine phosphatase receptor type N2 (PTPRN2, chromosome 7q36) (Fig. 1, Table 2). Supplementary Table 3 shows the fully annotated results for EWAS analysis for all four models.

Fig. 1
figure 1

Manhattan plot of − log10 p value vs. chromosomal position of each dmCpG from the four models used in the epigenome-wide association model. Panel A: Model I was adjusted for dmCpG, age, sex, white cell count, the first two principal components derived from genome-wide genotype data, and technical covariates with the steatosis score as outcome. Panel B: Model II utilized the Houseman count estimates and technical covariates. Panel C: Model III removed the principal components. Panel D: Model IV utilized assayed white blood counts (red blood cell, neutrophils, lymphocytes, eosinophils, basophils) in place of the Houseman Cell Count estimates. Eight CpGs in three genes were identified for follow-up

Table 2 Association of pyrosequenced CpG loci from three genes (ANK1, MIR10A, PTPRN2) with steatosis score and NAFLD

Validation by pyrosequencing of NAFLD-associated CpG loci in adolescence

Four dmCpGs were selected for validation by pyrosequencing; cg19537719 and cg27650870 located in the gene body of ANK1, cg01572694 located near MIR10A, and cg05821571 in the gene body of PTPRN2. Supplementary Fig. 1 shows Bland–Altman plots comparing DNA methylation levels measured by the EWAS and pyrosequencing at these CpG loci.

Association results for 16 pyrosequenced dmCpGs from the three genes identified during EWAS are shown in Fig. 2 and Table 2 for steatosis score and NAFLD. Accounting for sex and age, consistent with EWAS results 13 pyrosequenced dmCpGs were associated with steatosis score and 12 dmCpGs with NAFLD. When adjusted for estimated cell type, 8 dmCpGs remained associated with steatosis score and 5 dmCpGs with NAFLD. When waist circumference was included as a measure of adiposity, 6 dmCpGs remained significant for steatosis score and 5 dmCpGs with NAFLD.

Fig. 2
figure 2

Forest plots of the results from regression models between DNA methylation levels at 16 pyrosequenced CpGs (ANK1, MIR10A, PTPRN2) on the outcomes of steatosis score and NAFLD. β-coefficients for steatosis score, odds ratio for NAFLD, and 95% CI are shown for age and sex (model 1), adjusted for five estimate cell count variables (CD4T, CD8T, B cell, NK cells, monocytes), and accounting for waist circumference

ANK1 CpGs were the most consistent across all outcomes and models, with 5 dmCpGs associated with NAFLD. When waist circumference was included in the model, the ANK1 CpGs become more significant (Fig. 2). The dmCpG 8:41583512 (ANK1 CpG_10) was the most significantly associated with NAFLD across all pyrosequencing models.

For MIR10A, 3 dmCpGs (cg01572694 (MIR10A CpG_10), MIR10A CpG_7, MIR10A CpG_9) were associated with steatosis score and one CpG (MIR10A CpG_7) associated with NAFLD after cell count adjustment. When waist circumference was included in the steatosis score and NAFLD models none of the MIR10A dmCpGs remained significant. None of the three PTPRN2 dmCpGs that were measured by pyrosequencing were associated with NAFLD. Full results are shown in Table 2.

Association of CpGs identified by pyrosequencing with additional biochemical markers of liver function

We investigated the association of sixteen pyrosequenced CpGs with three liver biochemical markers (GGT, ALT, AST) using linear regression (Supplementary Table 4 and Fig. 3). All seven MIR10A CpGs were associated with ALT and GGT when accounting for age and sex. When further adjusted for cell count, four MIR10A CpGs demonstrated associations with ALT and six CpGs with GGT. Three ANK1 CpGs were significant for ALT and AST in both statistical models. For GGT five ANK1 CpGs were significant for age and sex and three remained associated after cell count adjustment.

Fig. 3
figure 3

Forest plots of the results from regression models between DNA methylation levels at 16 pyrosequenced CpGs (ANK1, MIR10A, PTPRN2) on the with liver enzymes (ALT, AST, GGT). β-coefficients and 95% CI are shown for age and sex (model 1) and adjusted for five estimate cell count variables (CD4T, CD8T, B cell, NK cells, monocytes)

Overlap with adult CpGs identified in adult NAFLD meta-analysis EWAS

We investigated if 22 dmCpGs associated with liver fat in adults [7] were associated with NAFLD in adolescence (Table 3). After correction for multiple testing (Bonferroni correction p value < 0.05/2 = 2.3 × 10–3), we identified one adult dmCpG (cg11024682) for steatosis score and three dmCpGs (cg14476101, cg26894079, cg11024682) for NAFLD. The dmCpG, cg11024682, were associated with both steatosis score and NAFLD. In addition, dmCpGs cg14476101 and cg26894079 were associated with NAFLD and nominal significance (p < 0.05) with steatosis score.

Table 3 Association of 22 adult non-alcoholic fatty liver disease associated dmCpGs with steatosis score and NAFLD in adolescence

Discussion

We conducted EWAS in a well-characterized cohort of adolescents, to identify specific DNA methylation signatures in whole blood associated with ultrasound-defined NAFLD. We identified dmCpGs in three genes (ANK1, MIR10A, PTPRN2) that were associated with steatosis score. Using pyrosequencing, associations in one of these genes (ANK1) were confirmed with both steatosis score and NAFLD after accounting for waist circumference and cell count heterogeneity. Further investigation of these specific CpGs with three liver biochemical markers identified CpGs in both ANK1 and MIR10A that were also associated with ALT, AST and GGT, after cell count adjustment. This consistency in the direction of effect for these associations, strongly supports our findings as biological associations.

ANK1 contains an ankyrin repeat domain, which modulates interactions between cytoskeletal and membrane proteins [23]. ANK1 protein is found in circulating extra-cellular vesicles in animal models of non-alcoholic steatohepatitis (NASH), suggesting a role in cell-to-cell signaling [24]. Genetic variants in ANK1 are associated with susceptibility to diabetes [25], and in the Mendelian disorder hereditary spherocytosis, a type of hemolytic anemia disease that is known to lead to jaundice, enlarged spleen and liver in pediatric patients [26]. In addition, DNA methylation status of ANK1 in newborns correlates with maternal pre-pregnancy BMI in humans [27]. We have previously shown that maternal BMI is a significant and independent risk factor for adolescent NAFLD in offspring [9]. Our data suggest that epigenetic changes in ANK1 may represent a potential link between maternal obesity and subsequent childhood NAFLD; alternatively, this association may be driven by obesity acting on differential DNA methylation and subsequently influencing NAFLD in adolescence [28].

An adult EWAS meta-analysis of NAFLD identified 22 significant dmCpGs [6]. We identified 3 of these 22 dmCpGs as at least nominally significantly associated with NAFLD in adolescents with the same direction of effect. This suggests that adolescence is an important transitional period to better understand the development of NAFLD and may represent the earliest stage of hepatic steatosis or any intermediate stage between childhood and adult NAFLD. While adults and adolescents share risk factors including obesity and insulin resistance, little is known about the role of DNA methylation patterns in liver tissue during childhood or adolescence. Another possibility is the observed DNA methylation is not causing the association but is rather a consequence of NAFLD, as seen in obesity [29]; if so, at the age we examined the duration of exposure to elevated liver fat may not have been sufficient to induce differential DNA methylation for all 22 dmCpGs currently identified in adulthood [7] or other potential differences across the different studies.

A limitation of our study is the modest sample size, limiting statistical power to detect the small effect sizes [30]. However, the Raine Study Gen2-17 is one of the largest population-based cohorts of adolescents with liver ultrasound assessment for NAFLD. Although ultrasound is less sensitive for the detection of minor hepatic steatosis compared with histology, our study utilized a validated and standardized imaging approach for the diagnosis of NAFLD. Our results are supported by overlapping findings with liver enzymes and metabolic risk factors. Furthermore, the European Association for the Study of the Liver recommends liver ultrasound, and not liver biopsy as the preferred initial assessment of individuals suspected of having NAFLD [31]. DNA methylation was measured in whole blood, utilizing estimated cell counts to correct for potential cell type differences, but the DNA methylation of these candidate loci in liver tissue is unknown. However, the overlap of GGT and ALT, well-known surrogates for fatty liver, provides additional confirmation for our findings. We validated our EWAS signal using pyrosequencing in only two (ANK1, MIR10A) of the three genes identified; this may be a result of the very high (> 85% methylation) PPTRN2 gene methylation levels. Lastly, the role of these dmCpGs in regulating transcription is unknown and so inference they are causatively involved in the etiology of NAFLD requires further mechanistic evaluation including the role of DNA methylation in leukocyte differentiation and function.

In summary, we conducted EWAS with hepatic steatosis score and NAFLD in a well characterized adolescent cohort using a two-stage approach. First, we identified dmCpGs relating to three genes that showed differential methylation in adolescents with steatosis score using a genome-wide approach. Second, we validated loci through pyrosequencing and confirmed the associations of loci in one gene with NAFLD (ANK1) after accounting for cell count heterogeneity and adiposity. In addition, we investigated the association of these CpGs with traditional liver biochemical markers and found several dmCpGs were associated with GGT and ALT, supporting the previous findings. These findings require replication in additional cohorts and further mechanistic research is needed to identify how changes in ANK1 methylation influence NAFLD or how NAFLD may influence ANK1 methylation and gene expression. Based on our informatic analysis, we speculate that methylation changes in CpG sites involved in cell–cell signaling and MIR10A controlled TGF-β pathways act during childhood to result in early changes of fatty liver in adolescence, with implications for NAFLD onset and progression in adulthood.