Background

The period from childhood to adolescence is associated with rapid somatic growth and incorporates a range of gender-dependent physiological and behavioral changes, including hormonal, height and body mass index (BMI) changes, possible use of oral contraceptives, and possible initiation of nicotine use [1, 2]. This period is also significant for the development of lung function as it represents a phase of dramatic growth from childhood to adolescence to reach a maximal level of lung function in early adulthood [3,4,5]. Lung function growth is gender-dependent and such dependence is attributable to multiple biological determinants, including dimensional/anatomical (e.g., airway size, somatic growth, lung growth, adolescence growth spurts), immunological, and hormonal determinants such as different phases of the menstrual cycle and common hormonal and metabolic conditions [6,7,8,9].

DNA methylation (DNA-M), as a potential marker of past exposure or significant changes in life such as pubertal onset, is an epigenetic mechanism and has been shown to play an important role in human development and health. DNA-M refers to methylation of the 5′ position of the cytosine base of cytosine-phosphate-guanine dinucleotide sites (CpG sites or CpGs) in the DNA [10]. It regulates gene function through the modulation of gene expression. Imboden et al. 2019 [11] and others have demonstrated that DNA-M in whole blood is associated with lung function [12,13,14,15,16], risk of asthma [17], and chronic obstructive pulmonary disease (COPD) [12, 13, 15, 16]. When assessing the association of DNA-M with lung function, most previous studies have been cross-sectional with both lung function and DNA-M measured at single time points [12,13,14,15,16], although DNA-M at some CpGs changes over time [18,19,20,21,22]. In our recent genome-wide study, we identified more than 10 K CpGs where DNA-M significantly changes over the adolescence period, and at some CpGs, such changes were gender-dependent [23].

To our knowledge, at CpGs which are potentially associated with lung function parameters such as forced expiratory volume in one second (FEV1) and forced vital capacity (FVC), no studies have examined whether and how changes in DNA-M at those CpGs are associated with changes in lung function during adolescence. Such an investigation will improve our understanding of epigenetic mechanisms in lung function development. In addition, DNA-M changes at CpGs shown to be associated with changes in lung function have the potential to predict future lung function changes, which, in the long run, may lead to strategies for the prevention of pulmonary disease. Taken together, we hypothesized that during adolescence, changes of DNA-M at some CpGs are associated with changes in lung function. Given that changes during adolescence are gender-dependent, we examined this hypothesis separately in males and females. The study was carried out in a birth cohort located on the Isle of Wight (IOW) in the United Kingdom. To assess generalizability, the findings were further examined in two independent birth cohorts, Avon Longitudinal Study of Children and Parents Cohort (ALSPAC) in the United Kingdom and Children, Allergy, Milieu, Stockholm, Epidemiology (BAMSE) in Sweden.

Methods

Discovery cohort - IOW cohort

Study participants

The IOW cohort is a population-based birth cohort and was established in 1989 on the IOW, United Kingdom. The study was approved by the IOW Local Research Ethics Committee at recruitment initial assessments and further assessments were approved by the National Research Ethics Service, Committee South Central – Southampton B (06/Q1701/34). Informed written consent was obtained from participants or their parents before participating. The study enrolled 1456 eligible children of 1536 born between January 1989 and February 1990 (after exclusion of adoptions, infant deaths, and denial). Details of the birth cohort of 1989 have been described elsewhere [24]. Longitudinal monitoring of diseases and assessments of environmental exposures in this cohort was conducted at birth, and ages 1, 2, 4, 10, 18, and 26 years. In the present study, we focused on data collected at ages 10 (n = 1373) and 18 (n = 1313) years. In total 320 and 453 participants had both DNA-M and lung function data available at ages 10 and 18 years, respectively, including 301 participants that had data at both time points.

Lung function

Spirometric measurements, specifically, FVC and FEV1 at ages 10 (n = 980) and 18 (n = 838) years were conducted using a Koko spirometer and software with a portable desktop device (both PDS Instrumentation, Louisville, KY, USA) and the ratio of FEV1 over FVC (FEV1/FVC) was calculated. Spirometry was conducted and evaluated according to the American Thoracic Society (ATS) guidelines [25, 26]. Participants were required to be free of respiratory infection and had not taken oral steroids for two weeks. In addition, participants were instructed to abstain from any β-agonist medication for six hours and caffeine intake for at least 4 h.

Measuring DNA methylation (DNA-M)

Peripheral blood samples collected at ages 10 (n = 330) and 18 (n = 476) years from randomly selected subjects were used for DNA extraction via a standard salting out procedure [27]. DNA concentration was estimated by Qubit quantitation. For each sample, one microgram DNA was bisulfite-treated for cytosine to thymine conversion using the EZ 96-DNA methylation kit (Zymo Research, Irvine, CA, USA), following the manufacturer’s protocol. DNA-M was measured using HumanMethylation450K or HumanMethylationEPIC BeadChips (Illumina, Inc., SanDiego, CA, USA). Arrays were processed using a standard protocol as described elsewhere [28], with multiple identical control samples assigned to each bisulfite conversion batch to assess assay variability. DNA samples were randomly distributed on microarrays to control against batch effects. Intensities of methylated and unmethylated sites were measured.

Preprocessing

Probes not reaching a detection p-value of 10− 16 in at least 95% of samples were excluded. CpGs on sex chromosomes were also excluded to avoid potential bias in DNA-M as there are the parent of origin differences in methylation of paternally and maternally inherited X chromosomes [29]. DNA-M data were pre-processed using the “CPACOR” pipeline for data from both platforms [30]. DNA-M intensities were quantile normalized using the R computing package, minfi [31]. DNA-M β values for each CpG was calculated as a ratio of methylated (M) over the sum of methylated and unmethylated (U) probes (β = M/[c + M + U]) interpreted as the percentage of methylation [32], where c is used as a constant to prevent zero in the denominator. Principal components (PCs) inferred based on control probes were used to represent latent variables due to chip-to-chip and technical (batch) variation. Since DNA-M data were from two different platforms (450 K and EPIC), we determined the PCs based on DNA-M at shared control probes between the two platforms. The 450 K BeadChips contained 220 control probes and the EPIC BeadChips contained 204 control probes, of which 195 overlapped between the two platforms. These 195 shared probes were then used to calculate the control probe PCs, top 15 of which were used to represent latent batch factors [30].

After pre-processing, a total of 473,864 and 847,155 CpGs were available in the 450K and EPIC methylation array data, respectively, and 439,635 overlappings CpGs were identified between the two platforms. CpGs with a single nucleotide polymorphisms (SNP) overlapping the detection probe with minor allele frequency ≥ 0.7% in Caucasians (corresponding to at least 10 subjects in the IOW cohort with n = 1456) within 10 base pairs of the targeted CpGs were excluded due to potential bias that those SNPs brought to the measurement of DNA-M. After excluding probe SNPs, 402,714 CpGs were included in the statistical analyses.

Confounders

Variables potentially associated with lung function change in addition to DNA-M change in adolescents are considered to be confounders, including changes in height and BMI, age of puberty onset, smoking status, socioeconomic status (SES), exposure to pets, exposure to air pollution, education status, farm exposure, paracetamol (acetaminophen) use, and non-steroidal anti-inflammatory drugs (NSAIDs) use [33,34,35,36].

Gender information was collected by questionnaire at each follow-up. Height was measured at 10 and 18 years of age before spirometric assessment. BMI was calculated from height and weight at age 10 and 18 years. Then changes of the height and BMI were calculated from age 10 to 18 years. The minimum age of puberty onset was estimated based on the following questions about the age of initiation of different pubertal changes: growth spurt of male or female, body hair growth of male or female, skin changes of male or female, deepening voice of male, facial hair of male, breast development of female, and initiation of menstruation of female. Smoking status was defined by the questions of current and past personal smoking status at age 18 years. A composite “SES-cluster” variable that accounts for SES broadly defined was used [37]. In order to correctly classify them, family SES were clustered using: (a) British socioeconomic classes (1-6) derived from parental occupation reported at birth; (b) number of children in the index child’s bedroom (collected at age 4 years); and (c) family income at age 10 years [37]. This composite variable captures the family social class across the entire study period. Information on exposure to cats, dogs, and other animals was collected at both ages 10 and 18 years via questionnaire. Information on whether the subjects are still in education (yes/no), farm exposure (yes/no), how often health is affected by exposing to air pollution (never/ every day/ once a month/ once a week/ once a year), paracetamol use (frequency of taking paracetamol in a month) and use of NSAIDs (frequency of taking NSAIDs in a month) were collected by questionnaire at age 18 years.

Replication cohort – the ALSPAC cohort

The Avon Longitudinal Study of Children and Parents (ALSPAC) is a population-based birth cohort study established in 1991 in Avon, United Kingdom, approximately 75 miles from the IOW. Details of the cohort were described elsewhere [38, 39]. Women residing in the South West of England who were pregnant and expecting to deliver between April 1, 1991 and December 31, 1992 were eligible to be recruited. In total, 14,541 pregnant women were eligible for the study, of those 13,761 were included with 10,321 providing DNA from blood samples. Participants were given questionnaires to gauge information regarding the mother. Written informed consent was obtained for all ALSPAC participants. Ethical approval for the study was obtained from the ALSPAC Ethics and Law Committee and the Local Research Ethics Committees. Information on environment, lifestyle, and health of the child and family was collected through annual questionnaires since the child’s birth. From age 7 years, all participants were invited to an annual research clinic, and thus exposure and other demographic data were available annually from 7 to 17 years. The follow-up cohort was composed of 13,988 children including multiple children from one family. In the replication study, we focused on ages 7 to 8 (7/8) and 15 years. Spirometry (Vitalograph 2120; Vitalograph, Maids Moreton, United Kingdom) was performed at 8 and 15 years of age according to ATS standards [26, 36], the same method as that applied in the IOW cohort. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/).

DNA-M in peripheral blood was assessed using the Infinium HumanMethylation450K BeadChip. The procedure for DNA sample preparation was comparable to that applied in the IOW cohort. DNA-M data of children at ages 7 (n = 966) and 15 (n = 966) years were available (twin participants were excluded). The pre-processing of DNA-M was performed by adjusting the batch effect, excluding CpGs with detection p-value ≥0.01, and excluding samples that were flagged a sex-mismatch based on X-chromosome methylation [40]. CpGs on sex chromosomes were not included in the analyses. Only fully characterized subjects with DNA-M and lung function at both ages (7/8 years and 15 years) were included in the replication study, which resulted in 691 paired samples.

Replication cohort – the BAMSE cohort

The Swedish Children, Allergy, Milieu, Stockholm, Epidemiology (BAMSE) cohort is an unselected, population-based cohort study of children from Stockholm, Sweden. During 1994–1996, a total of 4089 children were recruited at birth from four municipalities in Stockholm County and followed during childhood. The Regional Ethical Review Board, Karolinska Institute in Stockholm, Sweden, approved the baseline study with its follow-up. A thorough description of the cohort, inclusion and enrollment criteria, and procedure of data collection have been described elsewhere [41]. Follow-up questionnaires focusing on the children’s respiratory health, allergic diseases and on various exposure factors were collected at 1, 2, 4, 8, and 16 years old after obtaining informed consent from the parents of all participating children. At ages 8 (n = 1838) and 16 (n = 2063) years, lung function testing was conducted [42]. Maximal expiratory flow volume (MEFV) tests were performed at 8 and 16 years of age using the 2200 Pulmonary Function Laboratory (Sensormedics, Anaheim, CA, USA) and Jaeger MasterScreen-IOS system (Carefusion Technologies, San Diego, CA), respectively [42, 43]. All children performed several MEFV measurements and the maximal values of FVC and FEV1 were extracted for the analyses. The MEFV curve that passed visual quality inspection, and the two highest FEV1 and FVC readings were reproducible according to ATS/ European Respiratory Society criteria [26]. FEV1/FVC ratios were calculated. Height was measured before lung function testing for each participant.

DNA extracted from peripheral blood samples at ages 8 and 16 years of follow up was used to measure DNA-M [44]. For each sample, 500 ng DNA underwent bisulfite treatment for cytosine to thymine conversion using the EZ 96-DNA methylation kit (Shallow; Zymo Research Corporation, Irvine, CA, USA). DNA-M was assessed using the Illumina Infinium HumanMethylation450K BeadChip (Illumina, Inc.). After data preprocessing and quality control following the standard criteria [45], DNA-M data of 464 and 267 participants were available at ages 8 and 16 years, respectively.

Statistical analyses in the IOW cohort

To evaluate whether subjects included in the study reasonably represented those in the complete study cohort, we focused on the assessment of lung function at each age for both genders together and for each gender separately. To compare with the complete cohort, for continuous variables, including lung function, height, and BMI, one-sample t-tests were applied, and for categorical variables, including gender and smoking status, one-sample proportion tests were implemented.

Due to heteroscedasticity of DNA-M measured by β values [32], β values were logit-transformed to M values using log2 (β value/(1- β value)) [46]. Lung function measurements (FVC, FEV1, and FEV1/FVC) at each age were adjusted by height and gender by regressing lung functions on these two variables using SAS 9.4 procedure PROC GLM (SAS, Gary, N.C., USA).

In this study, we focused on lung-function-related CpGs. To achieve this goal, we first excluded CpGs which were not potentially associated with lung function. A screening package, ttScreening (training and testing screening, R package 3.3.2 version) [47, 48] was applied for this purpose. This method utilizes training and testing data in robust linear regressions with surrogate variables included in the regressions to adjust for unknown effects. For each lung function measure (FVC, FEV1, and FEV1/FVC), we performed the screening for each gender (males and females) at each age (10 and 18 years).

DNA-M measured in peripheral blood might be potentially influenced by cellular composition of blood samples, different batches for DNA-M measurement, and technical variation in the process of analyzing DNA samples. To adjust the impact of these factors on DNA-M, linear regressions were applied with DNA-M as the outcome variable, and cell type proportions, batch information, and top 15 principal components of the control probes were included as independent variables for age 10 and 18 years. Cell type proportions (CD4+ T cells, CD8+ T cells, natural killer cells, B cells, monocytes, neutrophils, and eosinophils) were inferred from methylation data for each sample using the R computing package minfi [31, 49]. After estimating the adjusted DNA-M for each age (10 and 18 years), differences in the adjusted DNA-M between ages 10 and 18 were calculated (DNA-M at age 18 – DNA-M at age 10) and included in subsequent analyses.

Finally, to explore whether the changes of DNA-M over the adolescence period from ages 10 to 18 years were associated with the change in lung function, a linear regression model was fitted for each lung function measure, stratified by gender. Changes in height- and gender-adjusted lung function from 10 to 18 years of age were treated as the outcome variable, and changes of the adjusted DNA-M at each CpG that passed screening were used as an independent variable and potential confounders as described above were included in the model. In all analyses, p-values were considered significant at a level of 0.05.

Replication analyses

CpGs identified in the IOW cohort were further tested in both the ALSPAC and BAMSE cohorts. Comparable analytical methods were applied except for the availability of some covariates. In ALSPAC, pet exposure, exposure to pollution, paracetamol use, and non-steroidal anti-inflammatory drugs use were not available, and in BAMSE, minimum age of puberty onset, pet exposure, exposure to pollution, and paracetamol use were not included in the final model.

Pathway analysis

For CpGs that showed consistent directions of association in the ALSPAC and BAMSE cohorts, the nearest gene was identified based on Illumina array manifest file and SNIPPER (https://csg.sph.umich.edu/ boehnke/snipper/) version 1.2. Bioinformatic assessment of the genes was conducted using the online bioinformatics tool ToppFun, available in the ToppGene Suite [50]. Multiple testing was adjusted by controlling the false discovery rate (FDR) of 0.05.

Results

Results from the IOW cohort

In total, 320 participants at age 10 years and 453 at age 18 years were included in the analyses for screening in the IOW cohort with available DNA-M and lung function data (Table 1). The mean values of FVC, FEV1, FEV1/FVC, height, and BMI for subjects in the present study were not significantly different from participants of the whole cohort with lung function at ages 10 (n = 980) and 18 (n = 838) years (Table 1) and for males and females separately with lung function at ages 10 (males = 488, females = 492) and 18 (males = 395, females = 443) (Table 2). Proportions of subjects who smoke or formerly smoked were also comparable to those in the complete cohort (Tables 1 and 2). One exception is that at age 10 years, a higher proportion of males were included in the present study compared to the whole cohort (Table 1).

Table 1 Characteristics of subjects with available methylation data with their lung function of the IOW cohort
Table 2 Characteristics of subjects with methylation data and lung function of IOW cohort, stratified by gender

To identify candidate CpGs potentially associated with lung function at ages 10 and 18 years, we applied ttScreening to the 402,714 CpGs in each gender. Three lung function parameters were considered in the screening process, FVC, FEV1, and FEV1/FVC. At age 10 years, across all the three lung function parameters, in total 361 distinct CpGs passed screening (157 CpGs for males and 204 CpGs for females), and at age 18 years, 530 distinct CpGs passed screening (274 CpGs for males and 256 CpGs for females). The break-down of the numbers of CpGs that passed screening for each lung function parameter was given in Fig. 1. Combining the CpGs that passed the screening at either time point for each gender and each lung function measurement, in males 431 distinct CpGs (178 CpGs for FVC, 151 for FEV1, and 122 for FEV1/FVC) and in females 460 distinct CpGs (174 CpGs for FVC, 158 for FEV1, and 161 FEV1/FVC) were included in the subsequent analyses. There were no common CpGs between the 431 and 460 CpGs identified in males and females.

Fig. 1
figure 1

Flow chart of statistical analyses and the number of CpGs after each analysis. Note: 1) *Number of significant CpGs were mentioned in an order for FVC, FEV1, and FEV1/ FVC changes respectively. 2) **At age 10 years, for males, between FVC and FEV1, and between FEV1 and FEV1/ FVC, 8 and 3 CpGs are overlapped, respectively; for females, between FVC and FEV1, 21 CpGs are overlapped in the screening. 3) At age 18 years, for males, between FVC and FEV1, and between FEV1 and FEV1 / FVC, 8 and 1 CpGs are overlapped, respectively; for females, between FVC and FEV1, between FEV1 and FEV1/ FVC, and between FVC and FEV1/ FVC, 9, 1, and 2 CpGs are overlapped, respectively, in the screening

Linear regression models were applied to assess the association of change in DNA-M at each of the screened CpG with the change of each lung function parameter (FVC, FEV1, and FEV1/FVC) for males (n = 169) and females (n = 132) separately. For females, after adjusting for multiple testing by controlling the FDR of 0.05, 42 CpGs showed statistically significant association with FEV1/FVC change, but for FEV1 and FVC, we did not identify any statistically significant CpGs. At these 42 CpGs, a larger increase in DNA-M was associated with a larger decrease in FEV1/FVC in females. From childhood to adolescence, generally FEV1/FVC is constant or falls linearly with age because FVC has a proportionately greater increase than FEV1 [51], which supports our findings. For males, no CpG survived multiple testing for any of the three lung function parameters. The 42 CpGs identified in females in the IOW cohort were further tested in the ALSPAC and BAMSE cohorts.

Results from the ALSPAC cohort

In total, 345 female (n = 935) participants in the ALSPAC had FEV1/FVC measurements and DNA-M measurements at both 7/8 years and 15 years old. Of the 42 CpGs examined, DNA-M changes at 16 CpGs (Table 3) showed consistent associations with FEV1/FVC changes (in terms of regression coefficients) compared to those observed in the IOW cohort (Fig. 2, Table 3), although not statistically significant at the 0.05 level. These 16 CpGs were noted as IOW-ALSPAC consistent CpGs. The complete results of this analysis were included in Additional file 1: Table S1.

Table 3 CpGs showing consistent associations in females between the IOW and replication cohorts, ALSPAC and BAMSE
Fig. 2
figure 2

Barplots of coefficients of IOW-ALSPAC and IOW-BAMSE consistent CpGs with their mapped genes in females. Note: The coefficients were shown for the association of DNA-M changes with changes in lung function (FEV1/FVC) in females adolescence. Mapped genes of the CpGs showing consistent associations between the IOW and ALSPAC cohorts (left panels) and between the IOW and BAMSE cohorts (right panels) were included. Gene names overlapped among the three cohorts were given in red font

Results from the BAMSE cohort

In the BAMSE cohort, 48 female participants had lung function and DNA-M data at ages 8 and 16 years, and DNA-M at 41 of the 42 CpGs were available in these 48 females. At 22 of the 41 CpGs, the associations of DNA-M changes with changes in FEV1/FVC were consistent with the findings in the IOW cohort, with one CpG showing statistical significance at 0.05 level (cg14552568) and two CpGs approached significance (cg01082111 and cg10027934, p-value < 0.1). These 22 CpGs were noted as IOW-BAMSE consistent CpGs, of which 11 of these IOW-BAMSE consistent CpGs were among the 16 IOW-ALSPAC consistent CpGs. These 11 CpGs were further noted as IOW-ALSPAC-BAMSE consistent CpGs.

Findings of the biological pathway analysis

Genes to which CpGs showed consistent results in either of the two cohorts (ALSPAC and BAMSE) in terms of the direction of associations mapped to were included in the pathway analyses. The 16 IOW-ALSPAC consistent CpGs were mapped to 16 genes, and 22 genes were identified for the 22 IOW-BAMSE consistent CpGs (Table 3). The selected 16 and 22 genes were further investigated to discover the functional enrichment in the biological process by using the bioinformatics tool ToppFun.

In total, eight biological processes were identified from the FDR adjusted p-value of 0.05 (Table 4). Eight genes, CELF4, INSIG1, PTCH1, RPS6KA4, ZNF304, RARA, IKBKB, and BANP to which the IOW-ALSPAC consistent CpGs were mapped, were involved in most of the eight biological processes. The same biological processes were found that involved genes CELF4, INSIG1, PTCH1, RPS6KA4, ZNF304, DLX5, WWOX, and ASH1L corresponding to the IOW-BAMSE consistent CpGs, although they did not survive multiple testing.

Table 4 Biological processes were identified from the mapped genes based on the IOW-ALSPAC consistent CpGs

Discussion

Limited studies have focused on longitudinal lung function and DNA-M measurements during adolescence, an important period of life that significantly contributes to lung function development [36, 43]. The present study is the first genome-scale exploration of the association of changes of DNA-M with changes in lung function during adolescence, stratified by gender. We showed that DNA-M changes in 11 CpGs were associated with changes in FEV1/FVC in females in adolescence, based on findings from the IOW cohort and two independent cohorts. Such associations were not identified in males. It is important to mention that, the final results focused on the direction of associations rather than statistical significance as non-equivalence of statistical significance and clinical significance has been recognized [52, 53]. We suggest that in replication studies agreement in clinical significance should be more important than statistical significance, although it will be most desirable when an agreement is reached in clinical significance accompanied by statistical significance.

Among the genes involved in the identified biological processes based on the findings in both ALSPAC and BAMSE cohorts, genes INSIG1, PTCH1, and PTPRN2 have been shown in a range of studies for their involvement in lung development, lung function, and inflammatory airway diseases such as asthma and COPD [54,55,56,57,58,59,60], although most findings were not specifically linked to adolescence. Gene INSIG1 allied with cg15575249 encodes the protein, insulin induced gene 1, which plays a significant role in regulating lipogenesis in alveolar types 2 cells consistent with the roles of sterol regulatory element-binding protein (SREBP)/ sterol cleavage-activating protein in lung lipid synthetic pathways [54]. INSIG1 is primarily involved in epithelial development and surfactant physiology during the perinatal period [55]. The findings in our study further emphasize its importance in the change of lung function in adolescence.

Gene PTCH1 allied with cg14319249 encodes a member of the patched family of proteins that functions as a receptor and a component of the hedgehog (Hh) signaling pathway [56,57,58]. The Hh signaling pathway is crucial in embryonic lung development processes, including the morphogenesis of lung and regulating the interaction between epithelial and mesenchymal cell populations in the airway and alveolar compartments [56,57,58]. Sonic Hh (one type of Hh signaling) is active in adult lung function [57, 58], but to our knowledge, its relation to lung function changes in adolescence has not been examined before. The link of PTCH1 with FEV1/FVC was also established in a genome-wide association study meta-analysis by the CHARGE consortium [59]. CpGs cg21584493 is mapped to gene PTPRN2. In a recent study, differentially methylated region (DMR) annotated to PTPRN2 genes was identified for the association with lung function and asthma in children [60]. Findings in our study on these genes (INSIG1, PTCH1, and PTPRN2) further emphasizes their epigenetic contribution to the changes in lung function in adolescence.

CpGs cg11316510 and cg09573852 on genes RARA (retinoic acid receptor alpha) and IKBKB (Inhibitor of Nuclear Factor Kappa B Kinase), respectively, were among the IOW-ALSPAC consistent CpGs but not on the list of IOW-BAMSE consistent CpGs. Their significant involvement in lung function, as well as lung function development and pulmonary diseases such as asthma and COPD indicated the potential importance of these two CpGs and their mapped genes [11, 61,62,63,64,65,66,67,68,69,70,71]. RARA is the predominant isotype of the retinoic acid receptor (RAR) identified in alveolar type II epithelial cells and components of the retinoic acid signaling pathway [63,64,65,66,67,68]. The retinoic acid signaling pathway plays important roles in lung development and alveolarization, and to regulate surfactant protein B gene expression in pulmonary epithelial cells. Adolescence is a period accompanied by significant lung function development and the functionality of this pathway supports the findings in our study. One of our recent studies also showed an epigenetic association of RARA with FEV1/FVC [11].

IKBKB is an enzyme complex that forms part of the nuclear factor-kappa B signaling pathway, which has been considered the master regulator of immune responses and demonstrated to play a cardinal role in allergic airways diseases [69,70,71]. In addition, gene IKBKB was required for the IL17-dependent signaling that was associated with neutrophilia and pulmonary inflammation [72].

It is worth noting that the genes discussed above were based on the findings in females in our study. For CpGs located on those genes, no statistically significant associations were shown in males. The identified unique 11 CpGs in three population-based cohorts thus have the potential to serve as epigenetic markers related to lung function development during adolescence in females, but not in males. The absence of such epigenetic associations in males led us to postulate the possibility of either different underlying epigenetic mechanisms in each gender in the regulation of gene activity, or that these CpGs are biomarkers of female physiology and/or exposures that influence lung function growth in adolescence. Thus, our findings may help to explain the various gender-associated health conditions related to lung function development in adolescence, such as gender reversal of asthma incidence in males and females.

There are some limitations of this study. Firstly, DNA-M measurements were made in peripheral blood leukocytes and provide no insight into epigenetic changes in structural cells of the airway. Secondly, concurrent instead of time-lagged modeling was applied to assess the association of DNA-M changes with lung function changes for each gender. In this context, we were not able to examine the potential of changes in DNA-M at the identified CpGs to predict lung function changes. In the IOW cohort, the analyses were based on data collected at ages 10 and 18 years representing pre- and post-adolescence. In the two replication cohorts, however, the corresponding ages were 7–8 years and 15 years for ALSPAC and 8 and 16 years for BAMSE. It is likely that many participants at age 15/16 years were still in the transition period or even just started puberty. This possibility accompanied by potentially significant changes in DNA-M during adolescence [23] might explain the non-replication of some CpGs identified in the IOW cohort. Other potential contributors to this non-replication may include some covariates being unavailable in the replication cohorts as well as variable characteristics unique to each cohort. On the other hand, the 11 CpGs showing consistent associations across all the three cohorts certainly deserve further assessment of their generalizability, as well as on the potential of predicting lung function changes.

Conclusions

This epigenetic study represents an integrated strategy to understand lung function changes in males and females during adolescence. We identified 11 CpGs as potential markers for lung function development, which are applicable to females only. Findings from the study provide insight into the role of epigenetics in gender-dependent lung function development during this critical period of life and thus providing a strong foundation to evaluate gender reversal of asthma from male to female in adolescence period. In subsequent studies, the detected 11 CpGs could serve as candidate epigenetic markers to predict changes in lung function during adolescence.