Introduction

The epidemic of obesity continues to increase globally despite its increasing awareness [1]. Obesity increases the risk of many chronic diseases among children and adults [1]. Asthma is a common chronic respiratory condition that predominantly originates in early childhood [2]. Several longitudinal epidemiological studies have identified obesity as a major risk factor for asthma [3]. A dose–response relationship of elevated body mass index (BMI) on asthma incidence has been demonstrated in a meta-analysis of prospective epidemiologic studies [4]. In the course of puberty, a gender reversal in asthma prevalence has been observed with a higher prevalence among boys before puberty and a higher prevalence among girls after puberty [5, 6].

Many recent studies have suggested a role of epigenetic programming in relation to both obesity [7] and asthma [8]. One of the most widely studied epigenetic mechanisms is DNA methylation (DNAm), which is known to respond to environmental exposures. DNAm is a potentially reversible process where a methyl group is attached to a nucleotide and can result from both genetic and environmental factors. DNAm at specific cytosine-phosphate-guanines (CpG) sites has been found to be associated with BMI [9, 10] and asthma [11, 12]. Recent investigations have suggested that changes in DNAm in blood and in adipose tissue are primarily the consequence of BMI [13, 14] rather than the other way around.

We have previously demonstrated that subjects with high BMI over time have a higher risk of asthma [15]. Given that increased BMI is associated with asthma, and both these chronic conditions are associated with DNAm, we hypothesized that DNAm mediates the association of BMI trajectories before adolescence with asthma incidence in young adulthood (Fig. 1). Path analyses were utilized to examine the mediation effects of DNAm. There is heterogeneity regarding the role of sex in the obesity-asthma relationship among studies reporting asthma incidence by sex; some studies showed significant associations between obesity and asthma regardless of sex [16, 17], while others demonstrated associations only in males [18, 19] or only in females [20,21,22]. Thus, in this study, we stratify the analyses by sex in order to focus on the assessment of epigenetic mediation effects.

Fig. 1
figure 1

Path analyses assessing DNAm mediation effect on the association of BMI trajectory and asthma incidence. a Effects of BMI-trajectories on methylation of CpGs, controlled for secondhand smoking status at 1, 2 and 4 years. b Effects of CpGs on the incidence of asthma, controlled for BMI trajectories, socio-economic status (SES), active smoking status at 18 years, pubertal events (age at onset of voice deepening in males and age at onset of menarche in females). c Direct effects of BMI trajectories on asthma acquisition

Methods

Study population

A population birth cohort was established on the Isle of Wight (IoW), UK, to prospectively study the natural history of allergic diseases among children. Of the 1536 pregnancies between January 1, 1989, and February 28, 1990, parents of all infants born over this period were contacted at birth, and subsequently, 1456 infants were enrolled following informed consent and exclusion. Follow‐ups for survey and clinical data were conducted at ages 1, 2, 4, 10, and 18 years. The IoW birth cohort (IOWBC) has been described in detail elsewhere [23]. Detailed interviews and examinations were completed for each child at each follow-up.

Data collection—outcome, exposures, covariates

The International Study of Asthma and Allergy in Childhood (ISAAC) questionnaire was used to obtain information regarding asthma at 18 years [24]. The questions used to assess asthma were: ‘History of physician diagnosed asthma?’, ‘Wheezing or whistling in the chest in the last 12 months?’ and ‘Asthma treatment in the last 12 months?’ Based on responses to these questions, a participant was determined to have asthma if she/he had experienced recurrent wheezing in the last 12 months and been given a clinical diagnosis of asthma by the physician with or without being treated with asthma medications.

Asthma incidence at 18 years was the outcome of interest for this study and was defined as not having asthma by age 10 years but developed asthma by age 18 years (no → yes). Subjects with asthma at ages 1, 2, 4 or 10 years were excluded to focus on the association of persistent BMI patterns in childhood with young adult asthma incidence. Height and weight of each participant was assessed at ages 1, 2, 4, and 10 years. Body mass index (BMI) was calculated using weight in kilograms divided by height in meters-squared at each age. Information regarding sex, maternal smoking during pregnancy, duration of breastfeeding, maternal and paternal disease status of asthma and age at specific pubertal events, i.e., age at onset of voice deepening in males and age at onset of menarche in females, was extracted from questionnaire data. Socio-economic status (SES) was defined based on household income, number of rooms and maternal education. Active smoking status at 18 years was recorded as either never smoker, current smoker or past smoker. Second-hand smoking exposure was determined using information obtained for tobacco smoke exposure from mother, father, or others at ages 1, 2, and 4 years.

DNA methylation

DNA was extracted from whole blood samples collected at 10 years of age using a standard salting out procedure [25]. One microgram of DNA was bisulfite-treated for cytosine to thymine conversion using the EZ 96-DNA methylation kit (Zymo Research, Irvine, CA, USA) for each sample, following the manufacturer’s standard protocol. Genome-wide DNAm for each CpG was assessed using either Illumina Infinium HumanMethylation450 BeadChips or the Methylation EPIC BeadChip (Illumina, Inc, San Diego, CA, USA), which interrogate > 484,000 and > 850,000 CpG sites, respectively. Arrays were processed using a standard protocol as described elsewhere [26], with multiple identical control samples assigned to each bisulfite conversion batch to assess assay variability. The BeadChips were scanned using a BeadStation, and the methylation level (beta (β) value) was calculated for each queried CpG locus using Methylation module of BeadStudio software.

DNAm data were preprocessed using the CPACOR pipeline for data from both platforms (HumanMethylation450 and MethylationEPIC) [27]. Specifically, the DNAm intensity data were quantile‐normalized using the R package, minfi [28]. Beta values were calculated representing proportions of intensity of methylated (M) over the sum of methylated and unmethylated (U) sites/probes (β = M/ [c + M + U], where c is a constant to prevent zero in the denominator if M + U is too small). Beta values close to 0 or 1 tend to suffer from severe heteroscedasticity, and it has been demonstrated that base-2 logit transformed beta values (denoted as M-values) perform better in differential analysis of methylation levels [29]. Therefore, M-values were used to represent methylation levels in the analysis.

Principal components (PCs) inferred based on control probes to represent latent chip‐to‐chip and technical variations were generated. Since DNAm data were from two different platforms, PCs were determined based on DNAm at shared control probes. In total, 195 control probes were shared between the two platforms (450 K and EPIC) and used to calculate the control probe PCs and the top 15 were used to represent latent batch factors [27]. These 15 PCs were included in subsequent analyses. Probes not reaching a detection p-value of 10–16 in at least 95% of samples were excluded. A comparable criterion was applied to exclude samples with a low quality of DNAm measurement. CpGs on the sex chromosomes were excluded to avoid bias in our analyses. Probes that contained single nucleotide polymorphisms (SNPs) within 10 base pairs of a targeted CpG site with a minor allele in at least 0.7% subjects (corresponding to at least 10 subjects in IoW with n = 1456) were excluded due to their influence on DNAm. After preprocessing, a total of 442,475 CpGs in common between the two platforms (450 K and EPIC) were included in the analyses.

Since blood is a mixture of functionally and developmentally distinct cell populations [30], adjusting for cell type compositions potentially mitigates the possibility of confounding cell heterogeneities in DNAm measured from blood samples [31]. To this end, we estimated cell type proportions using the method proposed by Jaffe and Irizarry [32], adapted from Houseman et al. [33], using the Bioconductor minfi package [28]. The estimated cell type proportions of CD4 + T cells, natural killer cells, neutrophil, B cells, monocytes, and eosinophil cells were included in the analyses as confounding factors.

Genome-wide RNA-seq gene expression data generation

Gene expression levels from peripheral blood samples collected at 26 years from the IOWBC was determined using paired-end (2 × 75 bp) RNA sequencing with the Illumina Tru-Seq Stranded mRNA Library Preparation Kit with IDT for Illumina Unique Dual Index (UDI) barcode primers following the manufacturer’s recommendations. All samples were sequenced twice using the identical protocol and for each sample the output from both runs were combined. FASTQC was run to assess the quality of the FASTQ files (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Reads were mapped against Human Genome (GRch37 version 75) using HISAT2 (v2.1.0) aligner [34]. The alignment files, produced in the Sequence Alignment Map (SAM) format, were converted into the Binary Alignment Map (BAM) format using SAMtools (v1.3.1) [35]. HTseq (v0.11.1) was used to count the number of reads mapped to each gene in the same reference genome used for alignment [36]. Normalized read count FPKM (Fragments Per Kilobase of transcript per Million mapped reads) were calculated using the countToFPKM package (https://github.com/AAlhendi1707/countToFPKM) and the log-transformed values were used for data analysis.

Statistical analysis

To examine whether the analytic sample (n = 224) reasonably represents the complete cohort (n = 1456), chi-square tests for categorical variables and one-sample t-tests for continuous variables were applied, stratified by sex. In addition, all the subsequent analyses were stratified by sex, considering gender reversal in asthma prevalence.

BMI trajectories

BMI trajectories were determined separately for both the sexes using their BMI values at ages 1, 2, 4 and 10 years. Our study focused on temporal patterns of BMI. Thus, the less informative standardized BMI (or Z-BMI) was not needed [37]. A group-based trajectory modelling, also referred to as a semiparametric mixture model [38, 39], was applied using PROC TRAJ in SAS [40] to identify BMI developmental paths in the form of trajectories across ages 1, 2, 4 and 10 years. The group-based trajectory method presumes that the data comprises of latent distinct groups (trajectories) that best summarize the distinct features as parsimonious as possible [38, 39]. Models with one to three groups were estimated for linear and quadratic terms. The selection of a best fit model was based on the smallest Bayesian Information Criterion value. Individuals were assigned to one of the trajectories/groups based on their highest estimated group-membership probabilities.

Association of BMI trajectories with asthma incidence

Subjects with asthma at ages 1, 2, 4 and 10 years were excluded from the analysis. We used multivariable logistic regression to evaluate the association of BMI-trajectory (independent variable) with asthma incidence at 18 years (dependent variable) along with covariates and confounders potentially associated with asthma incidence in the model: SES, active smoking status at 18 years, height at 10 years, maternal smoking during pregnancy, duration of breastfeeding (in weeks), parental history of asthma and age at pubertal events. For males, age at voice deepening, and for females, age at menarche were included in the model.

Screening for CpGs related to BMI trajectories

We regressed the M-values of DNAm at each CpG site on the aforementioned 15 PCs obtained from control probes and the 6 cell type proportions [33] to obtain batch- and cell-type-adjusted DNAm (residuals) for each sex. These residuals were batch- and cell types-adjusted DNAm and used in subsequent analyses. We applied an R package, ttScreening, to screen CpGs at 10 years with DNAm potentially associated with BMI trajectory groups [41]. In the selection process, the method implemented in the package utilizes training and testing data in robust linear regressions. The screening was performed separately for each sex. Following the guideline [41], the minimum frequency of selecting an informative CpG sites was set at 50%, i.e., a CpG site gained statistical significance in at least 50% of the randomly selected training and testing data set pairs. For CpGs that passed the screening, they were treated as potential BMI-trajectory-associated-CpGs.

Screening for BMI-trajectory-related CpGs associated with asthma incidence

Subjects with asthma at ages 1, 2, 4 and 10 years were excluded from the analysis. We used multivariable logistic regression to evaluate the association of potential BMI-trajectory-associated-CpGs (independent variable) with asthma incidence at 18 years (dependent variable) along with covariates and confounders potentially associated with asthma incidence in the model: BMI trajectory groups [42, 43], SES, active smoking status at 18 years, and age at pubertal events. For males, age at voice deepening, and for females, age at menarche were included in the model. Statistical significance for this in-depth screening process was set at 0.05.

Path analyses

Using path analyses, we explored the association between BMI trajectories at 1, 2, 4, 10 years and asthma incidence at 18 years, and whether the relationship between these variables was mediated by DNAm at 10 years (Fig. 1), with potential confounders included in each path. Goodness of fit criteria using chi-square test p-value > 0.05, RMSEA < 0.05, CFI > 0.95 was used. The path coefficients (direct and indirect estimates) represent the partial correlation between the independent and dependent variables after adjusting for confounders and covariates used in the model above. An R package, MplusAutomation, was utilized to iteratively call MPlus from R to perform path analyses with each of the CpGs as a mediator (Fig. 1) [44].

Association of DNAm with gene expression

To evaluate the biological relevance of the identified mediating CpGs, association between DNAm (in M-values) and expression of genes within a 500 kilo base pairs (kbps) window (250kbps upstream and 250kbps downstream of the CpG site) was evaluated at 26 years using linear regressions. Gene expression (n = 140) was the dependent variable, and DNAm was the independent variables.

Replication cohort—the Avon Longitudinal Study of Children and Parents (ALSPAC) cohort

CpGs shown to mediate the association of BMI trajectory groups with asthma incidence in the IOWBC were further assessed in an independent cohort, the Avon Longitudinal Study of Children and Parents (ALSPAC) [45, 46]. Women residing in the South West of England who were pregnant and expecting to deliver between April 1, 1991 and December 31, 1992 were eligible to be recruited. Of the 14,541 pregnant women eligible for recruitment, 13,761 were included in the study with 10,321 participants having their DNA sampled. DNAm in the ALSPAC cohort was assessed using the Infinium HumanMethylation450 BeadChip. The pre-processing of DNAm was performed by correcting for batch effects using the minfi package [28] and removing CpGs with detection p-value ≥ 0.01. Samples were flagged that contained sex-mismatch based on X-chromosome methylation. Estimated cell type proportions of CD4 + T cells, natural killer cells, CD8 + T cells, B cells, monocytes, and granulocytes cells were used in the analyses to adjust for cell heterogeneity. DNAm at 7 years was included in our study and its residuals were calculated by regressing M-values on cell type proportions. Please note that the study website contains details of all the data that is available through a fully searchable data dictionary and variable search tool (http://www.bristol.ac.uk/alspac/researchers/our-data/).

BMI trajectories were modeled at ages 1, 2, 4 and 7 years separately in both sexes. Subjects with asthma at 7 and 10 years were excluded, and asthma incidence was assessed at 17 years. Identical path analysis models as those applied in the IoW cohort were used with comparable covariates available in ALSPAC, including SES, active smoking status at 17 years and pubertal events. Secondhand smoking at 7 years was not considered due to low counts in asthma incidence in one of the secondhand smoking categories.

For the CpGs showing mediation effects, the genes annotated to the CpGs were summarized along with information such as gene location, chromosome number based on Illumina's manifest file and SNIPPER (https://csg.sph.umich.edu/boehnke/snipper/) version 1.2.

Results

We estimated BMI trajectory based on data from subjects with BMI available for at least two time points. In total, BMI of 602 boys and of 577 girls were included in the trajectory analyses. Two BMI trajectories were identified for both sexes that best summarized the complex developmental course of BMI across the first 10 years of life (Fig. 2) by optimizing the Bayesian Information Criterion. Since one trajectory included subjects potentially overweight or obese [47], we labelled that trajectory as a ‘high’ BMI trajectory, and the other as a ‘normal’ BMI trajectory. The distribution of variables used in the analysis is shown by the BMI trajectory groups in Additional file 1: Table S1. Results from logistic regressions indicated that subjects with high BMI trajectory had significantly increased odds of asthma incidence at 18 years in males (OR = 8.27, p-value = 0.004) and females (OR = 4.89, p-value = 0.001) after adjusting for confounders and covariates.

Fig. 2
figure 2

BMI trajectories across first 10 years of life in boys and girls respectively in IoW

The end point that we focused on in this study was asthma acquisition at, or close to, post-adolescence. Thus, in IOWBC, of the 1456 subjects in the cohort, 320 subjects had asthma at or before age 10 years and were excluded from the study. Of the remaining 1136 subjects, 122 male and 102 female participants also had DNAm data at 10 years and were included in the path analyses. Descriptive statistics indicated that the analytical subsamples represented the complete IoW birth cohort for all variables except for second-hand smoking in males (Table 1).

Table 1 Comparison of analytical subsample with complete cohort

To identify candidate CpGs potentially associated with BMI trajectory groups, ttScreening was applied to the 442,475 CpGs stratified for sex using residuals (batch- and cell-type-adjusted DNAm) at 10 years of age. In total, 159 CpGs in males and 212 CpGs in females passed screening. These CpGs were treated as potentially BMI trajectory associated CpGs and were included in the subsequent analyses.

For each CpG that passed the screening, its association with asthma incidence at age 18 years was further evaluated. After controlling for potential confounders, at significance level of 0.05, nine CpGs in males and six CpGs in females were found to be associated with asthma incidence at 18 years. These CpGs were tested for their mediation effects of BMI trajectories at ages 1, 2, 4, 10 years on asthma incidence at 18 years using path analysis. One CpG in males and three CpGs in females showing such mediation effects were identified (Table 2). At two of the four CpGs (cg23632109 in males and cg10817500 in females), BMI trajectory only showed indirect effects on the risk of asthma incidence via DNAm at these two CpG sites, and no statistically significant direct effects were observed. To help understand the mediating effects of DNAm, Fig. 3 included direct effects for each path at each CpG site. At cg23632109 in males and at cg10817500 in females, the coefficients suggest that high BMI trajectory is associated with high DNAm, which was further linked to an increased risk of asthma incidence at 18 years (Fig. 3). For the other two CpGs, cg03584646 and cg03508767 in females, both direct and indirect effects are statistically significant (Table 2). In particular, with direct effects of BMI trajectory on asthma incidence, high BMI trajectory was associated with increased risk of asthma incidence at 18 years, but such an association was attenuated by DNAm at cg03584646 and cg03508767 in that subjects with high BMI trajectory tended to have higher DNAm at these two loci, which was further associated with lower risk of asthma.

Table 2 Effects of childhood BMI trajectories on asthma incidence in adulthood via pre-adolescence DNAm
Fig. 3
figure 3

Indirect effects of childhood BMI trajectory on adulthood asthma incidence via DNAm at four CpGs. The figure shows the estimates (and p-values) of direct effects at each path, based on which indirect effects of BMI trajectory were inferred. For instance, the coefficient of 1.02 indicates that the methylation at cg23632109 in the high BMI trajectory group is 1.02 higher on average compared to the DNAm in the normal BMI trajectory group. The indirect effect of BMI trajectory via cg23632109 is obtained by 1.02*0.26 = 0.27 (Table 2). Goodness of fit criteria: Chi-square test p-value > 0.05, RMSEA < 0.05, CFI > 0.95 (except cg03584646: chi-sq. < 0.05, RMSEA = 0.15, CFI = 0.78). The first CpG is for males and the remaining three CpGs are for females

The association of the identified mediating CpGs and expression of the neighboring mapped genes was also evaluated. Of the 4 identified CpGs, significant associations were observed at 3 CpGs with 5 genes (Table 3). At 3 of the 5 genes, the relationship was negatively correlated, such that a higher DNAm was associated with lower expression levels for the respective genes. For instance, in females, a one unit increase in DNAm levels of cg03508767 was followed by a downregulation of 0.41 units in the PCDH1 gene.

Table 3 Association of DNAm with expression of neighboring mapped genes

No neighboring genes of cg03584646 showed association with DNAm.

To assess reproducibility, these four CpGs were further tested in the ALSPAC cohort. Similar to IOWBC, BMI trajectories were formulated in ALSPAC cohort using BMI at 1, 2, 4 and 7 years (Additional file 1: Figure S1). At all the 4 CpG sites, direction of indirect and direct effects were consistent with those identified in the IoW cohort, although none of the indirect effects were statistically significant at the level of 0.05 (Table 2). These four CpGs were mapped to four nearest genes (Table 2).

Discussion

We assessed the direct and indirect effects of childhood BMI trajectory on post-adolescence asthma incidence via DNAm. Two BMI trajectories, high BMI trajectory and normal BMI trajectory, were identified in both the discovery cohort (IOWBC) and the replication cohort (ALSPAC). The two trajectories were similar at early ages and as the children grew, the difference between trajectories increased, and some children were overweight or obese by 10 years and hence belonged in the high BMI trajectory. In the IOWBC, DNAm at four CpGs was shown to mediate the association of BMI trajectories with asthma incidence at age 18 years. At two of the four CpGs (i.e., cg23632109 in males and cg10817500 in females), only indirect effects of BMI trajectory on asthma incidence were observed and high BMI trajectory was positively associated with high risk of asthma incidence via high DNAm. It is also interesting to note that CpGs showing indirect effects were unique between males and females, which might be due to potentially different underlying mechanisms of asthma incidence between the two sexes. At the other two CpGs (cg03584646 and cg03508767), the total effects, encompassing of both direct and indirect effects, remain consistent with existing literature such that subjects with a high BMI trajectory had greater odds of asthma incidence [15]. Nevertheless, the strength of this association was attenuated at these two sites. Although the longitudinal study design could not prove causality, we postulate that these two CpGs may have a possibility of being protective and further investigations on such a postulation are needed. The same direction of mediation effects were observed at all the four CpGs in ALSPAC, although they were not statistically significant.

Assessment of biological relevance of the identified mediating CpGs indicated a potential epigenetic regulatory functionality of these CpGs on expression of their neighboring genes. Previous research has shown that common genetic variations in PCDH1 gene increases the risk of developing asthma [48] and bronchial hyperresponsiveness [49]. It has also been shown that polymorphisms in TAS2Rs (TAS2R5) may predict outcomes and therapeutic responses in asthmatic individuals [50]. Given our findings, there is a possibility that correlation of the identified CpGs corresponding to these two genes with gene expression may have been due to genetic effects via methylation quantitative trait loci (methQTL) [51, 52] and further in-depth assessment is certainly warranted.

Three of the four identified genes (Table 2), TBC1D16, TBC1D8, and RASA2, have been previously implicated in relation to asthma and/or obesity/BMI, supporting the informativity of genes identified in this study. For example, the interleukin-6 receptor (IL-6R) is linked to increased risk of asthma [53] and obesity [54], and its expression is positively associated with expression of TBC1D8 and negatively with expression of TBC1D16 [55]. One study noted that DNAm of TBC1D8 was associated with obesity [56], while others showed that expression of TBC1D8 was positively associated with obesity in abdominal and gluteal subcutaneous adipose tissue [57]. Many studies have suggested a connection between RASA2 and BMI [58, 59], as well as its association with atopy and asthma [60]. While these previous studies have focused on the association between gene expression and/or genetic variation in these genes with BMI and asthma, the results of this study, from the angel of epigenetics, suggest that these genes may act as a potential epigenetic mediator between BMI trajectories and asthma.

Adolescence is a period in which both males and females experience rapid growth and in which there are clear sex-specific changes in asthma incidence. The availability of asthma status at two key time points, pre- and post-adolescence, offered us the opportunity to examine asthma incidence during this critical period. In comparison to measuring BMI at discrete time points, BMI trajectories allow for the dynamic visualization of BMI changes over time for certain groups of participants and allow researchers to follow similar developmental patterns of BMI over age, thereby reflecting unique features of each group. The use of BMI trajectories allows for the simultaneous consideration of intensity, age of onset and duration of adiposity, which may improve the predictability of future BMI patterns. To our knowledge, this is the first study to examine the DNAm at pre-adolescence mediates the association of BMI trajectories in childhood with asthma incidence at young adulthood. This study design with a unique time order, encapsulating pre-adolescence to post-adolescence, allowed for the dissection of the total effects of BMI trajectory on the risk of asthma and whether and how DNAm affects this relationship.

The direction of direct and mediating effects was consistent between the two cohorts at all four CpG sites identified in the IoW cohort. Statistical significance, however, was not observed at those CpG sites in the ALSPAC cohort. One possible reason for the lack of statistical significance in the ALSPAC cohort might be due to the differences in the ages of assessment for both asthma and BMI between ALSPAC and the IoW cohort. In ALSPAC, BMI up to age 7 years was included in trajectory analyses and DNAm was assessed at age 7 years, while in IOWBC, it was at age 10 years (up to age 10 years BMI and age 10 years DNAm). It is possible that underlying epigenetic mechanisms at age 7 years were not strong enough to be detected. On the other hand, it has been argued whether statistical significance is more important than clinical significance in the situation that our data could not satisfy both [61]. In terms of marker detection, we feel agreement in clinical significance is more important, i.e., the agreement in the direction of associations, since the behavior of those markers is linked to the risk of asthma incidence. CpGs with opposite directions of associations between the discovery and replication cohorts will lose a potential to serve as markers. In our study, at all the discovered CpGs, consistent directions of associations (both direct and indirect effects) were found in the replication cohort, ALSPAC. If this had happened by chance, the probability of this strong coincidence is 0.0039 (calculated by multiplication of the p-values for direct and indirect effects in ALSPAC, i.e., 0.92*0.37*0.60*0.48*0.23*0.31*0.87*0.65 = 0.0039). Such a low probability indicates the occurrence of this result being solely due to chance is very unlikely. Nevertheless, the statistically insignificant findings in the ALSPAC cohort indicate that caution is needed when generalizing the findings. Further assessment of these CpGs in large scale studies and different populations will benefit the validity and generalizability of the identified markers.

Some limitations are present in our study. The two BMI trajectories inferred in the IOWBC and ALSPAC were based on their best fit. Since both birth cohorts are population based, we expect the trajectories will reflect populations with similar features. On the other hand, demographics and age range in this study may be limiting factors in the external validity of findings, and hence generalization of these trajectories and related findings to other populations should be implemented with caution. In one of our earlier studies [15], four trajectories were identified, “normal”, “early persistent obesity”, “delayed overweight”, and “early transient overweight”, which had more detailed features compared to the two trajectories inferred in the current study. In light of our genome-scale analyses, in addition to a carefully planned screening process, to avoid power loss, we focused on parsimonious trajectories to detect epigenetic factors potentially associated with overweight or obesity in general. The CpGs included in the path analysis model were pre-selected based on their association with BMI trajectory and asthma incidence during the screening process. Because of these detailed considerations, in the discovery phase, multiple testing was not adjusted. However, large scale studies are certainly warranted to scrutinize the connection between epigenetics and detailed BMI trajectories. In addition, we evaluated the contributions of each CpG site. These CpGs may be correlated and jointly impact asthma incidence, which could not be addressed by the present study. Although future studies are warranted to further examine the credibility of the identified CpGs, the consistency in the results between the two cohorts indicates a role for epigenetics in the association of obesity and asthma. The identified CpGs have the potential to improve our understanding of the underlying biological pathways in the connection between obesity and asthma incidence during adolescence.