Differential DNA methylation of steatosis and non-alcoholic fatty liver disease in adolescence

Background and aims Epigenetic modifications are associated with hepatic fat accumulation and non-alcoholic fatty liver disease (NAFLD). However, few epigenetic modifications directly implicated in such processes have been identified during adolescence, a critical developmental window where physiological changes could influence future disease trajectory. To investigate the association between DNA methylation and NAFLD in adolescence, we undertook discovery and validation of novel methylation marks, alongside replication of previously reported marks. Approach and results We performed a DNA methylation epigenome-wide association study (EWAS) on DNA from whole blood from 707 Raine Study adolescents phenotyped for steatosis score and NAFLD by ultrasound at age 17. Next, we performed pyrosequencing validation of loci within the most 100 strongly associated differentially methylated CpG sites (dmCpGs) for which ≥ 2 probes per gene remained significant across four statistical models with a nominal p value < 0.007. EWAS identified dmCpGs related to three genes (ANK1, MIR10a, PTPRN2) that met our criteria for pyrosequencing. Of the dmCpGs and surrounding loci that were pyrosequenced (ANK1 n = 6, MIR10a n = 7, PTPRN2 n = 3), three dmCpGs in ANK1 and two in MIR10a were significantly associated with NAFLD in adolescence. After adjustment for waist circumference only dmCpGs in ANK1 remained significant. These ANK1 CpGs were also associated with γ-glutamyl transferase and alanine aminotransferase concentrations. Three of twenty-two differentially methylated dmCpGs previously associated with adult NAFLD were associated with NAFLD in adolescence (all adjusted p < 2.3 × 10–3). Conclusions We identified novel DNA methylation loci associated with NAFLD and serum liver biochemistry markers during adolescence, implicating putative dmCpG/gene regulatory pathways and providing insights for future mechanistic studies. Graphical abstract Supplementary Information The online version contains supplementary material available at 10.1007/s12072-022-10469-7.


Graphical abstract
Keywords Epigenetics · EWAS · ANK1 · MIR10A · PTPRN2 Epigenetic processes are key components linking environment, genetics, and metabolic disease risk [4]. Previous epigenome-wide DNA methylation association studies (EWAS) have identified differentially methylated CpG sites (dmCpGs) associated with NAFLD in adults [5,6]. Adolescents and children have been rarely studied [7], but the influences of early life exposures on the epigenome may be clearer and interventions more impactful during times of physiological plasticity [8].
Adolescents with NAFLD may be at the earliest stages of disease and confounders such as alcohol consumption, type 2 diabetes mellitus and other metabolic diseases are less prevalent than in adults. Rapid physiological changes in adolescence denote a period where dysmetabolism may initiate liver damage [9]. Hence, well-characterized adolescent cohorts with a substantial prevalence of NAFLD provide an opportunity to investigate the association between epigenetic variation and NAFLD.
This study aimed to identify dmCpGs in adolescents with NAFLD. We performed a cross-sectional EWAS from whole-blood in the population-based Raine Study where detailed liver assessment had been undertaken at age 17 [10]. The use of whole-blood is well established in EWAS, as it represents a relatively accessible tissue for biochemical

NAFLD
Non-alcoholic fatty liver disease EWAS Epigenome DNA methylation study dmCpG Differentially methylated CpG site GGT γ-Glutamyl transferase ALT Alanine aminotransferase AST Aspartate aminotransferase ANK1 Ankyrin-1 HOMA-IR Homeostasis model assessment of insulin resistance MIR10a MicroRNA 10a PTPRN2 Introduction Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease in developed countries [1]. NAFLD pathogenesis is thought to be multifactorial, influenced by lifestyle, diet, and genetics [2] but dominated by elevated central adiposity [3]. analysis in populations studies. Following EWAS, we validated the most strongly associated dmCpGs using pyrosequencing and examined their relationship with additional measures of liver biochemistry (γ-glutamyl transferase (GGT), alanine aminotransferase (ALT), and aspartate aminotransferase (AST)). Finally, we examined DNA methylation of 22 dmCpGs associated with NAFLD in adults [6].

The Raine study
The Raine Study is a longitudinal cohort study initiated 1989-1992 in Perth, Western Australia as a cohort of pregnant women ("Gen1") and their offspring ("Gen2"). The Raine Study Gen2 cohort is representative of the general population of Western Australia, as described in detail elsewhere [11]. The current cross-sectional follow-up study was performed when the cohort had reached approximately age 17 years (Gen2-17); 1170 participants underwent assessment including (i) a detailed health questionnaire; (ii) anthropometric assessment; (iii) abdominal ultrasonography; and (iv) fasting biochemistry.

Steatosis score and NAFLD definition
NAFLD was diagnosed by ultrasound-confirmed hepatic steatosis and a daily alcohol consumption < 10 g for females and < 20 g for males [12]. Ultrasound by trained sonographers used a Siemens Antares ultrasound machine with a CH 6-2 curved array probe (Sequoia, Siemens Medical solutions, Mountain View CA), according to a standardized protocol [13]. A single radiologist interpreted images and scored hepatic steatosis severity based upon echotexture, deep attenuation, and vessel blurring (0-1 no steatosis, 2 mild steatosis, and 3-6 moderate-severe steatosis). The intra-observer reliability (κ statistics) for fatty liver was 0.78 (95% confidence interval [CI] 0.73-0.88). Testing for hepatitis B or C virus infections was not performed because notification rates were on average less than 24/100,000 and 23/100,000, respectively, for Western Australian adolescents aged 15-19 years over the study period [12].

Univariate analysis
A total of 707 of the original 1,170 Raine Gen2 Age 17 participants who had undergone assessment for NAFLD had complete epigenome and covariate data used for statistical analysis. Univariate comparisons of continuous demographic and biochemical variables with NAFLD status were compared with Student's t or Welch's oneway tests if normally distributed, and Kruskal-Wallis or Wilcoxon rank sum tests if skewed. Associations of binary variables with NAFLD were assessed using t-tests for parametric variables and Mann-Whitney U tests for non-parametric variables. Measures of adiposity were BMI, and waist circumference, while liver biochemistry comprised serum γ-GGT, ALT, and AST [12]. Insulinmetabolism measures were fasting glucose and insulin, homeostasis model assessment of insulin resistance (HOMA-IR). Serum high-sensitivity C-reactive protein (hsCRP), leptin and adiponectin were measured [12].

epigenome-wide DNA methylation association analysis
For EWAS with ultrasound liver steatosis scores, we used linear mixed effects models. Four models were analysed for internal validation: (i) Model 1 adjusted for CpG, age, sex, white blood cell count, principal components derived from genome-wide genotype data, and technical covariates with plate number representing the random effect in the model; (ii) Model 2 included variables from model 1 and Houseman cell count estimates; (iii) Model 3 used all model 2 estimates without principal components; and, (iv) Model 4 included model 1 covariates with assayed white blood counts (red blood cell, neutrophils, lymphocytes, eosinophils, basophils.)

Overlap with adult CpGs identified in NAFLD meta-analysis
We investigated 22 dmCpGs previously associated with liver fat accumulation in adults [6]. A Bonferroni correction of p value < 0.05/22 = 2.3 × 10 -3 was used to define statistical significance as we are hypothesis testing if the dmCpGs demonstrate signal at an earlier age.

Pyrosequencing validation
Inclusion criteria for CpG pyrosequencing were genes represented by 2 or more dmCpGs that were within the top 100 most significantly associated CpGs in statistical model 3 and that were significant across all four statistical models at p < 0.007. Four dmCpGs (cg01572694 MIR10A, cg05821571 PTPRN2, cg19537719 ANK1, cg27650870 ANK1) in three genes passed these criteria. Sodium bisulphite pyrosequencing was carried out on whole blood DNA samples at age 17 as described [21] (Supplementary Table 1). Pyrosequencing was carried out using PCR products (10 μl) to measure DNA methylation (%) of sixteen dmCpGs (Pyro-Q-CpG 1.0.9 software, Supplementary Table 2) across the three genes of interest (ANK1, MiR10A, PTPRN2). Agreement between methylation from pyrosequencing and EWAS arrays was assessed by Bland-Altman plots for four dmCpGs (cg01572694 MIR10A, cg05821571 PTPRN2, cg19537719 ANK1, cg27650870 ANK1), onesample t-test of the difference and linear regression between mean methylation (independent variable) and difference in methylation (dependent variable).

Pyrosequencing association analysis
Three statistical models assessed association of DNA methylation of dmCpGs determined by pyrosequencing with steatosis score or NAFLD: (1) model 1 accounted for age and sex; (2) model 2 accounted for age, sex, and five Houseman cell count covariates (CD4T, CD8T, B cell, NK, and monocytes). Granulocytes were removed due to high collinearity with steatosis score and NAFLD [22]; (3) model 3 investigated whether the associated CpG was also influenced by adiposity and included waist circumference as a covariate. These dmCpGs were also investigated if associated with three markers of liver biochemistry (GGT, ALT, AST).
All analyses were performed using the statistical package R, version 3.0 or above. Table 1 summarizes characteristics of the 707 adolescents who had liver assessments, genome, and epigenome-wide profiling at the Raine Study Gen2-17 follow-up. Overall prevalence of NAFLD was 14.5%, with a higher prevalence in females than males (17.4% vs 11.8%, p value = 0.02). NAFLD was associated with higher adiposity, HOMA-IR, serum ALT, GGT and hsCRP, and with lower adiponectin (Table 1).

Validation by pyrosequencing of NAFLD-associated CpG loci in adolescence
Four dmCpGs were selected for validation by pyrosequencing; cg19537719 and cg27650870 located in the gene body of ANK1, cg01572694 located near MIR10A, and cg05821571 in the gene body of PTPRN2. Supplementary Fig. 1 shows Bland-Altman plots comparing DNA methylation levels measured by the EWAS and pyrosequencing at these CpG loci. Association results for 16 pyrosequenced dmCpGs from the three genes identified during EWAS are shown in Fig. 2 and Table 2 for steatosis score and NAFLD. Accounting for sex and age, consistent with EWAS results 13 pyrosequenced dmCpGs were associated with steatosis score and 12 dmCpGs with NAFLD. When adjusted for estimated cell type, 8 dmCpGs remained associated with steatosis score and 5 dmCpGs with NAFLD. When waist circumference was included as a measure of adiposity, 6 dmCpGs remained significant for steatosis score and 5 dmCpGs with NAFLD.
ANK1 CpGs were the most consistent across all outcomes and models, with 5 dmCpGs associated with NAFLD. When waist circumference was included in the model, the ANK1 CpGs become more significant (Fig. 2). The dmCpG 8:41583512 (ANK1 CpG_10) was the most significantly associated with NAFLD across all pyrosequencing models.   For MIR10A, 3 dmCpGs (cg01572694 (MIR10A CpG_10), MIR10A CpG_7, MIR10A CpG_9) were associated with steatosis score and one CpG (MIR10A CpG_7) associated with NAFLD after cell count adjustment. When waist circumference was included in the steatosis score and NAFLD models none of the MIR10A dmCpGs remained significant. None of the three PTPRN2 dmCpGs that were measured by pyrosequencing were associated with NAFLD. Full results are shown in Table 2.

Association of CpGs identified by pyrosequencing with additional biochemical markers of liver function
We investigated the association of sixteen pyrosequenced CpGs with three liver biochemical markers (GGT, ALT, AST) using linear regression (Supplementary Table 4 and Fig. 3). All seven MIR10A CpGs were associated with ALT and GGT when accounting for age and sex. When further adjusted for cell count, four MIR10A CpGs demonstrated associations with ALT and six CpGs with GGT. Three ANK1 CpGs were significant for ALT and AST in both statistical models. For GGT five ANK1 CpGs were significant for age and sex and three remained associated after cell count adjustment.

Discussion
We conducted EWAS in a well-characterized cohort of adolescents, to identify specific DNA methylation signatures in whole blood associated with ultrasound-defined NAFLD. We identified dmCpGs in three genes (ANK1, MIR10A, PTPRN2) that were associated with steatosis score. Using pyrosequencing, associations in one of these genes (ANK1) were confirmed with both steatosis score and NAFLD after accounting for waist circumference and cell count Fig. 2 Forest plots of the results from regression models between DNA methylation levels at 16 pyrosequenced CpGs (ANK1, MIR10A, PTPRN2) on the outcomes of steatosis score and NAFLD. β-coefficients for steatosis score, odds ratio for NAFLD, and 95% CI are shown for age and sex (model 1), adjusted for five estimate cell count variables (CD4T, CD8T, B cell, NK cells, monocytes), and accounting for waist circumference heterogeneity. Further investigation of these specific CpGs with three liver biochemical markers identified CpGs in both ANK1 and MIR10A that were also associated with ALT, AST and GGT, after cell count adjustment. This consistency in the direction of effect for these associations, strongly supports our findings as biological associations.
ANK1 contains an ankyrin repeat domain, which modulates interactions between cytoskeletal and membrane proteins [23]. ANK1 protein is found in circulating extracellular vesicles in animal models of non-alcoholic steatohepatitis (NASH), suggesting a role in cell-to-cell signaling [24]. Genetic variants in ANK1 are associated with susceptibility to diabetes [25], and in the Mendelian disorder hereditary spherocytosis, a type of hemolytic anemia disease that is known to lead to jaundice, enlarged spleen and liver in pediatric patients [26]. In addition, DNA methylation status of ANK1 in newborns correlates with maternal pre-pregnancy BMI in humans [27]. We have previously shown that maternal BMI is a significant and independent risk factor for adolescent NAFLD in offspring [9]. Our data suggest that epigenetic changes in ANK1 may represent a potential link between maternal obesity and subsequent childhood NAFLD; alternatively, this association may be driven by obesity acting on differential DNA methylation and subsequently influencing NAFLD in adolescence [28].
An adult EWAS meta-analysis of NAFLD identified 22 significant dmCpGs [6]. We identified 3 of these 22 dmCpGs as at least nominally significantly associated with NAFLD in adolescents with the same direction of effect. This suggests that adolescence is an important transitional period to better understand the development of NAFLD and may represent the earliest stage of hepatic steatosis or any intermediate stage between childhood and adult NAFLD. While adults and adolescents share risk factors including obesity and insulin resistance, little is known about the role of DNA methylation patterns in liver tissue during childhood or adolescence. Another possibility is the observed DNA methylation is not causing the association but is rather a consequence of NAFLD, as seen in obesity [29]; if so, at the age we examined the duration of exposure to elevated liver fat may not have been sufficient to induce differential DNA methylation for all 22 dmCpGs currently identified in adulthood [7] or other potential differences across the different studies.
A limitation of our study is the modest sample size, limiting statistical power to detect the small effect sizes [30]. However, the Raine Study Gen2-17 is one of the largest population-based cohorts of adolescents with liver ultrasound assessment for NAFLD. Although ultrasound is less sensitive for the detection of minor hepatic steatosis compared with histology, our study utilized a validated and standardized imaging approach for the diagnosis of NAFLD. Our results are supported by overlapping findings with liver enzymes and metabolic risk factors. Furthermore, the European Association for the Study of the Liver recommends liver ultrasound, and not liver biopsy as the preferred initial assessment of individuals suspected of having NAFLD [31]. DNA methylation was measured in whole blood, utilizing estimated cell counts to correct for potential cell type differences, but the DNA methylation of these candidate loci in liver tissue is unknown. However, the overlap of GGT and ALT, well-known surrogates for fatty liver, provides additional confirmation for our findings. We validated our EWAS signal using pyrosequencing in only two (ANK1, MIR10A) of the three genes identified; this may be a result of the very high (> 85% methylation) PPTRN2 gene methylation levels. Lastly, the role of these dmCpGs in regulating transcription is unknown and so inference they are causatively involved in the etiology of NAFLD requires further mechanistic evaluation including the role of DNA methylation in leukocyte differentiation and function.
In summary, we conducted EWAS with hepatic steatosis score and NAFLD in a well characterized adolescent cohort using a two-stage approach. First, we identified dmCpGs relating to three genes that showed differential methylation in adolescents with steatosis score using a genome-wide approach. Second, we validated loci through pyrosequencing and confirmed the associations of loci in one gene with NAFLD (ANK1) after accounting for cell count heterogeneity and adiposity. In addition, we investigated the association of these CpGs with traditional liver biochemical markers and found several dmCpGs were associated with GGT and ALT, supporting the previous findings. These findings require replication in additional cohorts and further mechanistic research is needed to identify how changes in ANK1 methylation influence NAFLD or how NAFLD may influence ANK1 methylation and gene expression. Based on our informatic analysis, we speculate

Declarations
Conflict of interest K.M.G. has received reimbursement for speaking at conferences sponsored by companies selling nutritional products and is part of an academic consortium that has received research funding from Abbott Nutrition, Nestec and Danone. G.C.B. has received research funding from Nestle, Abbott Nutrition and Danone. He has served as a member of the Scientific Advisory Board of BASF and is a member of the BASF Asia-Pacific Grant Award Panel.

Informed consent
The study was approved by the University of Western Australia Human Ethics Committee and all participants gave their written informed consent.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.