Background

With the rise of available DNA methylation (DNAm) data in multiple cohort studies, the number of epigenome-wide association studies (EWAS) demonstrating a connection between DNAm and allergic diseases has increased. Over the last decade, EWAS reported associations of single CpGs (addition of a methyl-group to a cytosine in the context of CpG dinucleotides) with several allergic outcomes: High total immunoglobulin E (IgE) [1, 2], an antibody involved in Type I immune response and highly associated with allergic diseases, specific IgE [3] against certain aeroallergens and specific IgE plus skin-prick test [4] and meta-analyses on asthma [5] and any allergic disease [6]. Many of these CpGs have been successfully replicated in independent cohorts, and we could verify the robustness of these findings via replication of significant hits in the German LISA study [7].

However, it is unknown whether DNAm changes occur in response to allergic disease or if differential DNAm can serve as predictor of future development of allergies. Looking at aeroallergen sensitization, an objectively measured indicator of allergic diseases, we previously reported that methylation risk scores (MRS), which are defined as a weighted sum of methylation beta estimates, can be considered as cross-sectional biomarkers of current sensitization [7]. However, the predictive capabilities in prospective associations with aeroallergen sensitization were limited, indicating that DNAm might be a result rather than a predictor of allergic sensitization. On the other hand, studies investigating DNAm in cord-blood found associations with higher IgE levels later in life [8, 9], indicating a certain predictive potential.

One way to investigate this “chicken or egg—what came first?” question is a causal mediation analysis with data on exposure, mediator and outcome from three subsequent time points. Known determinants of allergic disease that can be used as exposures in such mediation analyses include genetic and environmental factors. Allergic diseases are highly heritable, with heritability estimates for allergic diseases being described as high as 91.7% for asthma [10], 90% for atopic dermatitis [11], 91% for allergic rhinitis and 68% for specific serum IgE (reviewed in Ober and Yao [12]). Additionally, numerous genetic variants associated with allergic diseases have been identified in multiple genome-wide association studies (GWAS), e.g., for atopic dermatitis [13], rhinitis [14] or any allergic disease [15, 16]. Polygenic risk scores (PRS) have been proposed to summarize genetic susceptibility to allergic diseases in one score for allergic trajectories [17] or asthma prediction [18, 19], presenting a significant association and a predictive area-under-the-curve of up to 0.59 for early transient asthma phenotypes and 0.58 for intermediate-onset wheeze [18]. However, as genetic variation in complex diseases represents a risk increase but not a certainty of disease onset as in monogenic diseases, family history of allergic diseases can be additionally considered as a proxy for the combination of allergic inheritance and environmental risk.

Further looking at environmental risk factors, maternal smoking during pregnancy represents a well-established environmental risk factor, which has been shown to influence allergic outcomes, especially asthma [20], and has also been biologically validated in preclinical mouse models [21].

A methodological challenge of investigating the “chicken or egg” question in causal mediation analyses is the high-dimensionality of DNAm data with up to 850K CpG sites being measured with the most recent Illumina DNAm arrays (Illumina MethylationEPIC BeadChip microarray). Several approaches have been proposed to address high-dimensionality in mediation analysis including (1) dimension-reduction methods, e.g., by using MRS, (2) integration of prior knowledge by only focusing on CpG sites with a known association with the exposure or outcome (or both) and (3) hypothesis-generating high-dimensional mediation analyses (HMA).

The objective of this study is to determine the causality of the observed associations between changes in DNAm and the development of allergen sensitization using HMA and MRS. We conduct different HMA at two subsequent time points using well-established determinants of allergic disease (maternal smoking during pregnancy, family history of allergies and a PRS for any allergic disease) as exposures and prospective measurements of DNAm and aeroallergen sensitization as mediators and outcomes.

Methods

Study population

For this study, we used data from a population-based German birth cohort on the Influence of Life-style factors on Development of the Immune System and Allergies in East and West Germany (LISA). From 1997 to 1999, a total of 3,097 full-term healthy newborns were recruited at four study centers (Munich, Wesel, Leipzig and Bad Honnef). The study was approved by local ethics committees (Bavarian Board of Physicians, Board of Physicians of North-Rhine-Westphalia and Medical Faculty of the University of Leipzig) and written, informed consent was obtained from the parents or legal guardians. In the present study, only data from participants enrolled in the Munich study center with parental consent for genetic analyses at both six and ten years is included (Nmax = 240).

Aeroallergen sensitization

Positive aeroallergen sensitization was defined as a specific IgE threshold of > 0.35 kU/L (at least Radio-Allergo-Sorbent-Test (RAST) class one), measured for a mix of common aeroallergens (SX1 mix: Dermatophagoides pteronyssinus, cat, dog, rye, timothy grass, Cladosporium herbarum, birch and mugwort). Serum at six, ten and 15 years was analyzed using the CAP-RAST FEIA system (Pharmacia Diagnostics, Freiburg, Germany) according to the manufacturer’s instructions.

Risk factors for aeroallergen sensitization

Genome-wide data in the LISA study were measured using the Affymetrix Chip 5.0 and 6.0 (Thermo Fisher Scientific, USA). More information on genetic data can be found in the supplementary material of Grosche et al. [22]. We calculated a PRS for any allergic disease based on the genome-wide significant hits reported in Ferreira et al. [15, 16]. Single nucleotide polymorphisms (SNPs) were extracted for each participant and weighted with the reported effect size. Multiallelic SNPs, highly correlated variants (Linkage disequilibrium R2 > 0.7), those with a low imputation quality (< 0.4) or a minor allele frequency of less than 1% were excluded. Further information on quality control and PRS calculation can be found elsewhere [7, 23].

Information on family history of allergic diseases was collected at birth and defined as a binary factor indicating no family history or at least one biological parent reporting ever experiencing asthma, atopic dermatitis or hay fever.

Maternal smoking during pregnancy was defined as smoking in the second and/or third trimester of pregnancy, with controls defined as either stopped smoking before the second trimester or never smoking. Potential confounders after literature research are sex, age, season at blood withdrawal, cell-type proportions, Body-Mass-Index (BMI), socio-economic status (SES), and air pollution, defined as nitrogen dioxide (NO2) at birth address (Additional file 2: Table S1).

DNAm data

DNAm was measured for 256 participants from blood clots taken at six and ten years using the Methylation EPIC BeadChip (Illumina, Inc., San Diego, CA). We applied functional normalization [24] and ComBat [25] to normalize the data and remove technical variation. Probes were removed if they were located on the sex chromosomes, had missing values, or failed the detection p value of 0.01 in more than 1% of samples. Samples were removed if they were outliers, sex mismatches, or did not fulfill the bad-sample threshold of methylated and unmethylated intensities. Cell-type proportions were estimated using the EpiDISH package [26]. Further information on quality control and data processing can be found elsewhere [7].

Methylation risk scores

MRS were calculated for six allergy-related EWAS, namely high IgE [2], aeroallergen sensitization [3], asthma [5], any allergic disease [6] and two on atopy, defined as high total IgE [1] or positive specific IgE as well as a positive skin-prick test [4]. Details on the calculation and evaluation of these allergy-related MRS have been published previously [7]. In short, we calculated each MRS by weighting the CpG beta-values with the respective effect size identified by the EWAS and transformed to z-scores. The selection of CpG sites was conducted using a pruning and thresholding approach [27]. As described previously [7], the MRS that reached the highest prediction accuracy for allergic sensitization at six years of age across all p-value thresholds was used in the downstream analyses.

Statistical analysis

To evaluate whether changes in DNAm are predictors or consequences of allergic diseases, we tested the following two hypotheses: (H1, DNAm as predictor) The association between exposure (maternal smoking during pregnancy; family history of allergic disease; PRS for any allergies) and allergic sensitization is mediated by prior changes in DNAm (measured by MRS or methylation in individual CpG sites); (H2, DNAm as consequence) The association between exposure (maternal smoking during pregnancy; family history of allergic disease; PRS for any allergies) and changes in DNAm (measured by MRS or methylation in individual CpG sites) is mediated by prior allergic sensitization. In our main analyses, mediators were measured at six years and outcomes at ten years, both for hypothesis (H1) and (H2). In addition, we conducted a secondary analysis for hypothesis (H1), in which mediators (DNAm) were measured at ten years and outcome (aeroallergen sensitization) at 15 years (Fig. 1 and Additional file 1: Figure S1).

Fig. 1
figure 1

Display of models used for the identification and validation of potential mediators. Hypothesis (H1) describe the mediation of aeroallergen sensitization through DNAm and hypothesis (H2) the reversed direction that sensitization is mediating DNAm changes. Time window A covers the development from six to ten years and time window B from ten to 15 years. See also Additional file 1: Figure S1

Mediation analyses rely on the following three assumptions [28]: (1) no exposure-mediator confounding, (2) no mediator-outcome confounding and (3) no exposure-outcome confounding. To fulfill these assumptions to the best of our knowledge, we constructed directed acyclic graphs (DAGs) to visualize each of these paths using dagitty [29] (Additional file 1: Figures S2–S9). A minimal sufficient adjustment set was identified for each pathway via the tracing of association directions and elimination of any potential confounders already associated with a precursory confounder. Exposure-mediator models were adjusted for SES (Exposure: maternal smoking during pregnancy), SES and NO2 exposure at birth (family history) and sex (PRS) for both hypotheses. Mediator-Outcome models were adjusted for all potential confounders according to the DAGs (Additional file 1: Figures S2–S9). A detailed description of the definition and assessment of these covariates is provided in Additional file 1: Table S1.

Associations with continuous outcomes (MRS or DNAm in individual CpG sites) were analyzed using linear regression and associations with binary outcomes (allergic sensitization) were analyzed using logistic regression.

Causal mediation analysis of MRS

Causal mediation analysis, using the R package mediation [30], was applied to test the two hypotheses (H1) and (H2) for allergy-related MRS. Results were adjusted for multiple testing using the Benjamini–Hochberg procedure [31] for false-discovery rate (FDR) together within each H1 and H2.

High-dimensional mediation analysis of individual CpGs

High-dimensional mediation analyses (HMA) were used to test the two hypotheses for individual CpGs. H1 was tested using the Divide-Aggregate Composite-Null test (DACT), HIMA, and gene-based HMA (gHMA). H2 was tested using only DACT, because HIMA and gHMA are only applicable for high-dimensional mediators but not for high-dimensional outcomes.

  1. 1.

    Previous knowledge + Divide-Aggregate Composite-Null test (DACT)

    Based on previously published EWAS of total IgE [1, 2], aeroallergen sensitization [3, 4], childhood asthma [5] and any allergic disease [6] we used existing knowledge on allergy-relevant CpGs to reduce the multiple testing burden. Of the 1673 previously reported CpGs, 1501 were available in the LISA cohort and 583 CpGs were significantly associated with aeroallergen sensitization in the LISA cohort at six years [7] (False discovery rate ≤ 0.05; adjustment for Houseman cell -type estimates to resemble the initial discovery analyses), which were further taken as testing-set of potential mediators. Of note, none of these CpGs were significantly associated with any of the exposures after multiple testing correction and adjustment for sex, detailed age and EpiDISH cell-type estimates (Additional file 2: Table S2).

    We used DACT for the composite null hypothesis of no mediation effect as suggested by Liu et al. [32] to improve the multiple testing burden. In short, DACT takes the p values from the exposure-mediator and the mediator-outcome model to compute a new joint list of p-values, which will be used to determine significance (p-value < 0.05). This is done by aggregating the weighted p -values of the three possible null-hypotheses leading to no mediation effect and calibrating this using Efron’s empirical null framework [33].

  2. 2.

    HIMA

    Whereas the previous approach relied on existing knowledge as a baseline selection of mediators, HIMA as proposed by Zhang et al. [34] uses a three step procedure to identify significant CpGs throughout the whole epigenome. First, the top CpGs with the largest effect sizes (beta of standardized inputs) for the response variable are identified using sure independence screening (SIS) [35]. The total number of top hits (N) varies per model and is calculated by N = 2*n/log(n), with n being the input sample size. To capture relevant CpGs with our smaller sample size, we applied a looser threshold than the original publication. In a second step, HIMA estimates the mediation effect using minimax concave penalty and performs joint significance testing as a third and final step.

  3. 3.

    Gene-based HMA (gHMA)

    We further applied gene-based high-dimensional mediation analysis (gHMA) as proposed by Fang et al. [36]. The idea behind this approach is that not single CpGs but genes act as biological units and should therefore be analyzed together. The functions further provide different modeling options for linear or nonlinear relationships and an omnibus-test to combine both, which outperformed the single models in their simulation study. First, we annotated every CpG to their nearest gene within 20,000 base pairs as done previously [37], resulting in 40,916 different genes. We then applied gHMA to each of these 40,916 genes, each covering between one and 1758 CpGs, performing the linear, nonlinear and omnibus-test for significance. We used differing kernel-thresholds of 0.7, 0.8 and 0.9 as values for explained variance by the kernel principal components. Results of the omnibus-test were corrected using the Benjamini–Hochberg procedure [31].

Validation of CpG sites using causal mediation analysis

All significant CpG sites identified with the three described methods above are followed up using a causal mediation analysis to determine the direct, indirect, and total effects as well as the proportion mediated. Multiple testing correction followed the one applied for the MRS evaluation by calculating the FDR for all H1 CpGs together, the same correction was applied for H2 CpGs. Models and adjustment are the same as for MRS analyses and single CpGs were afterwards annotated using mQTL databases provided by Gaunt and Hawe et al. [38, 39].

Sensitivity analyses

We conducted a set of sensitivity analyses to evaluate the robustness of associations for any CpG sites that were successfully validated in the causal mediation analysis described above.

First, to further evaluate the impact of differences in cell-type proportions on our findings, we conducted a sensitivity analysis in which we additionally adjusted all exposure-mediator associations for estimated cell types, which are otherwise only included in the mediator-outcome associations.

Second, to focus exclusively on newly developed aeroallergen sensitization in our mediation analyses with aeroallergen sensitization as outcome, we conducted a sensitivity analysis in which we excluded individuals already sensitized at baseline DNAm measurement.

Third, we conducted sex-stratified analyses, as puberty may play a role in allergen sensitization [40].

Replication of potential mediators

Single CpGs moving forward to validation in causal mediation analysis was further replicated in the independent Swedish BAMSE (Swedish abbreviation for Children, Allergy, Milieu, Stockholm, Epidemiology) cohort, which recruited 4093 newborns between 1994 and 1996. Ethical approval was given by the Regional Ethics Board (EPN) and further information is available elsewhere [41]. Here, we used exposure data from birth (maternal smoking in second and/or third trimester of pregnancy, any family history of allergic diseases and the same calculated PRS for any allergic disease [7, 23]), DNAm data measured at eight years of age with the Illumina Infinium HumanMethylation450 BeadChip (Illumina Inc., San Diego, USA) [6] and outcome data (positive aeroallergen sensitization to the SX1 mix) from 16 years. Further information on genetic and DNAm data can be found in Additional file 1: Methods S1.

All analyses were performed in R [42] V.4.1.2 in LISA and V.4.1.3 in BAMSE.

Results

The total sample size for the six different models and time windows, from six to ten years (A) and from ten to 15 years (B), varied from 143 to 229, only including participants, who had all necessary data available (respective exposure, DNAm and covariates) (Fig. 1 and Additional file 1: Figure S1). Participants in the overall sample for all models were majority male (57.7%) and their blood samples were collected primarily during the allergy season from March to August. Prevalence of aeroallergen sensitization increased from baseline to follow-up in each time window and missing values for exposures ranged from six (maternal smoking) to twelve missing values in the PRS (Table 1).

Table 1 Description of total sample of LISA participants included in this study

Causal mediation analysis for MRS

Allergy-related MRS were not found to be a mediator of the association between family history of allergies and subsequent allergic sensitization (H1, Fig. 2A). However, we found significant indirect effects for the association between family history of allergies and all six allergy-related MRS with prior allergic sensitization as mediators (H2) (e.g., Indirect effect (Chen2017) = 0.081 [0.020; 0.160]). Proportion mediated by allergic sensitization ranged from 33.7% (Everson2015) to 49.6% (Zhang2019) (Table 2 and Fig. 2B). Results were robust to additional adjustment for cell-type estimates as exposure-mediator confounders in our sensitivity analysis (Additional file 2: Table S3 and S4), while keeping the mediator-outcome confounders, including cell-type estimates, consistent.

Fig. 2
figure 2

MRS as predictor or consequence of allergic disease. Significant indirect effects are indicated with an asterisk. The title follows the pattern exposure–mediator–outcome. Evaluation A whether the association between family history of allergic disease and allergic sensitization at ten years is mediated by prior changes in DNAm at six years (measured by MRS) or B whether the association between family history of allergic disease and changes in DNAm at ten years (measured by MRS) is mediated by prior allergic sensitization at six years. The six MRS can be allocated to the following phenotypes: Chen2017—total IgE, Everson2015—atopy, Peng2019—aeroallergen sensitization, Reese2019—childhood asthma, Xu2021—any allergy and Zhang2019—atopy, respectively

Table 2 Significant mediation (FDR < 0.05) between family history as exposure and MRS, mediated by aeroallergen sensitization measured (H2)

We did not find any significant mediation effects for maternal smoking during pregnancy or the PRS for either of the two hypotheses. Full results for all MRS models can be found in Additional file 2: Tables S5 (H1) and S6 (H2) for the time window from six to ten only, as DNAm as an outcome was not measured at 15 years of age.

DACT

We identified 90 unique CpGs as potential mediators (H1) with the DACT approach: For the first time window (A) from six to ten years, we found 18 CpGs for maternal smoking, 51 for family history and six for the PRS. For the second time window from ten to 15 the numbers were 20, 19 and ten, respectively. Of all of these, only one CpG (cg26851984) was validated in causal mediation analyses (significant indirect effect after multiple-testing correction), for time window A and maternal smoking as exposure (Table 3). Differential DNAm at cg26851984 mediates 81% of the association between maternal smoking and aeroallergen sensitization and is robust to additional adjustment for cell-type estimates of the exposure-mediator association. Of note, cg26851984 is also an mQTL with 58 surrounding SNPs as reported in a recent publication by Hawe et al. [39]. A mediation plot for cg26851984 is presented in Fig. 3 (first panel) showing the validated associations with the single CpG as mediators.

Table 3 DNAm in individual CpG sites as predictors of aeroallergen sensitization (H1). Displayed CpGs were significantly validated in the causal mediation analysis (FDR < 0.05)
Fig. 3
figure 3

DNAm in individual CpG sites as predictor or consequence of allergic disease. CpG sites that were identified as mediators in at least one HMA (HIMA or DACT) and validated in causal mediation analysis are presented. Significant indirect effects are indicated with an asterisk and the title follows the pattern exposure–mediator–outcome. Evaluation A whether the association between (i) maternal smoking during pregnancy/(ii) family history of allergic disease/(iii) PRS for any allergies and allergic sensitization at ten/15 years is mediated by prior changes in DNAm at six/ten years or B whether the association between (i) maternal smoking during pregnancy/(ii) family history of allergic disease/(iii) PRS for any allergies and changes in DNAm at ten years is mediated by prior allergic sensitization at six years. For cg19310430 there is no corresponding model for hypothesis (H2) as DNAm was not measured at 15 years

In the reversed models investigating sensitization as a potential mediator of subsequent changes in DNAm (H2), we did not identify any mediation effects for individual CpGs in either main model (Additional file 2: Table S7).

HIMA

Dependent on the sample size of the different exposures and time windows, between 58 and 85 CpGs (N = 2*n/log(n); Fig. 1) were screened for highest effect sizes during the first step of HIMA and had their estimates calculated and tested for joint significance in HIMA in different models. We identified three CpGs as potential mediators in the time window from six to ten years (time window A), one CpG of the association between each exposure and aeroallergen sensitization. In addition, we identified four CpGs as mediators in the later time window (B) from ten to 15 years, three for PRS as exposure and one for family history (Additional file 2: Tables S7 for full results and S8 for annotated hits). Four of the seven identified CpGs were significantly validated in the causal mediation analysis and none are located in mQTLs (Table 3; Fig. 3 (panels 2–5)).

Sensitivity analyses

All CpGs presented in Table 3 showed nominal significant associations after additional adjustment for cell-type proportions between exposure and mediator (Additional file 2: Table S9) and when restricting the analysis sample to those who were not sensitized at the time of DNAm measurement (Additional file 2: Table S10). However, those associations were not significant after adjustment for multiple testing. We did not find sex-specific differences in mediation effects in terms of effect estimates and direction of effects, but indirect effects were only significant for three of the five CpGs in males (Additional file 2: Table S11) and for none of the CpGs in females (Additional file 2: Table S12), most likely due to the reduced sample size.

gHMA

We did not identify any significant genes for either time window or exposure with the gHMA method.

Replication in BAMSE

Data was available for 445 participants with DNAm measured at eight and aeroallergen sensitization measured at 16 years of age (Additional file 2: Table S13). Table 4 presents the results from BAMSE for our previously validated CpGs (Table 3). Due to the different arrays used in LISA and BAMSE, only two of the five CpGs were available for replication. None of these two CpGs could be replicated in BAMSE, but for cg26851984 the directions of the indirect and direct effects are the same compared to LISA. Full results are included in Additional file 2: Table S14.

Table 4 DNAm in individual CpG sites as predictors of aeroallergen sensitization (H1). Replication of validated CpGs (Table 3) in BAMSE

Discussion

The present study investigated whether DNAm is a potential cause/predictor or a consequence/outcome of sensitization by conducting causal mediation analyses for well-known risk factors of aeroallergen sensitization as exposures (maternal smoking during pregnancy, family history of allergies, and PRS for any allergy) and data on DNAm and aeroallergen sensitization from two consecutive time points as outcomes. We found evidence that DNAm in most previously identified CpG sites (summarized in MRS) was a consequence rather than a cause of aeroallergen sensitization. In addition, we identified five single CpGs that mediated the association between maternal smoking during pregnancy, family history of allergic diseases and a PRS and subsequent aeroallergen sensitization, thus serving as predictors of sensitization. Aggregating both hypotheses, we suggest that DNAm can be a cause as well as a consequence of aeroallergen sensitization, depending on the genomic location.

This study further attempted replication of identified CpGs in the independent Swedish BAMSE cohort but could not significantly replicate any of the five reported CpGs. This might, however, not necessarily negate our findings, as three of the five CpGs were not measured in BAMSE (450K chip vs. EPIC chip in LISA). Furthermore, the time difference is larger between the two assessment points in BAMSE (eight to 16 vs. ten to 15 in LISA). To the best of our knowledge, there are no previous studies investigating causal epigenetic mediation between prenatal exposures and aeroallergen sensitization in childhood and adolescence. Previous studies have reported mediation effects of DNAm for the associations between body-mass-index (BMI) and trajectories with asthma [43], BMI and cardio-metabolic risk [44], and age at puberty onset and lung function [45]. Of note, none of these studies investigated both directions, DNAm as both a predictor (H1) and as a consequence (H2).

Publications investigating mQTLs found that DNAm changes are often seen as a consequence of diseases rather than their cause [46] and this is supported by our findings on the allergy-related MRS. However, in the present study we also identified CpGs which serve as mediators for the association between known determinants of allergies and aeroallergen sensitization. Of note, none of the identified single CpGs are part of the evaluated MRS after clumping and thresholding, even though one has been previously reported by the same EWAS as an associated CpG site (Peng [3]). This might indicate that DNAm acts in both effect directions, represented by differing sets of CpG loci.

On the one hand, our finding that MRS are rather a consequence than a cause of sensitization falls in line with our previous results [7], which might also rely on the fact that the pre-identified CpGs were reported in mostly cross-sectional EWAS. On the other hand, the single CpGs mediating prenatal exposures on aeroallergen sensitization later in life, might be facilitated as early predictors for disease development. These should be followed up in future studies to further determine their clinical relevance.

For cg26851984, which was identified as a mediator of the association between maternal smoking during pregnancy and sensitization with DACT, we identified the closest gene to be PRPF3. This gene is associated with eczema [13], eosinophil counts [47] and any allergy [15], supporting the importance of this CpG as a mediator of allergen sensitization. Of note, this CpG was previously reported in an EWAS on aeroallergen sensitization [3], as only previously known CpGs were tested as potential mediators with the DACT method. However, it is not part of the allergy-related MRS previously calculated based on these EWAS [7] after clumping and thresholding. Further, it is a mQTL and its associations have to be interpreted with caution as effects here could be attributable to surrounding SNPs, which may explain the higher mediation effect size (0.139 for maternal smoking as exposure and cg26851984 as mediator) compared to all others (≤ 0.108), but also the higher albeit non-significant proportion mediated of 81.1%.

Other CpGs identified with the hypothesis-generating HIMA approach were also located in proximity to allergy-relevant genes. ATXN2L, located in the exon boundary and corresponding to cg17992705, is associated with forced vital capacity [48], a lung function parameter that is reduced in asthma patients. Further, DIP2C (cg12724894) and ASB2 (cg03389164) are associated with eosinophil counts [49, 50] and located within the gene body and promoter, respectively.

Looking at Figs. 2 and 3, it can be seen that not all total effects are significant while the indirect effects are. While significant total effects were a prerequisite of potential mediation in the traditional causal step approach proposed by Baron and Kenny in 1986 [51], it is not a formal requirement in the causal mediation analysis approach we used, but reduces that statistical power to detect indirect effects [50, 51]. While all of our exposures are known risk factors for aeroallergen sensitization, they might not necessarily show significance in our reduced sub-sample. The total effect is defined as the sum of the direct and all indirect effects and we do sometimes observe opposite effect signs for direct and indirect effects (e.g., cg17992705), which can attenuate the total effects.

The present study has multiple strengths: We have objectively measured data on all levels of the analysis for the model in which PRS is the exposure, as neither PRS, DNAm, nor blood-measured aeroallergen sensitization is subject to recall bias. In addition, the LISA study is a well-established prospective German birth cohort with still ongoing follow-up and provides a valuable data source for studying allergic diseases. This also supports the causal interpretation, as the longitudinal succession of measured mediators and outcomes was possible due to the longitudinal design of the study. DNAm is being measured repeatedly at both six and ten years, as well as consecutive time points being used for the definition of exposure, mediator, and outcome. This longitudinal design might also enable future analyses, ideally paired with similar studies with comparable design to reach higher statistical power for epigenome-wide mediation analyses. Further, we applied three different HMA methods complemented with causal mediation analysis to investigate their applicability to the allergic context in contrast to simpler screening methods for reduction of the multiple-testing burden. Each HMA approach is based on different assumptions and uses different strategies to deal with the challenges of multiple testing.

Limitations of the presented study include the small sample size, which might be insufficient to detect all potential mediation effects, especially as effects of single CpGs are rather small. This might also explain why we could not replicate single CpGs in both time windows (A&B) or why we did not find significant gene-units using the gHMA approach. It could also be speculated that single CpGs might be more relevant in relation to allergic sensitization than methylation across a whole gene, as this is the biggest difference between gHMA as a gene-based approach and the others (HIMA and DACT) as CpG-based approaches. Further, applying the PRS as an exposure, we did not check whether there is significant mediation between single SNPs and CpGs, but with the development of relevant methodology [52] this is of great interest for future studies. MRS were further determined according to their cross-sectional prediction accuracy and not optimized according to their performance in a prospective or mediation setting as applied here. Another general issue might be confounding, which is a serious problem in mediation analysis [28]. We adjusted our models based on DAGs to the best of our knowledge, however, unmeasured confounding cannot be ruled out completely in observational studies.

Conclusions

In conclusion, we found indications that DNAm could either be the cause of allergic sensitization or the consequence thereof, depending on the genomic location. The two different sets of DNAm patterns, namely MRS as consequence of sensitization or single CpGs as cause, have differing clinical implications: While MRS might be considered as cross-sectional biomarkers, the single CpGs might be clinically relevant early predictors of sensitization and should be investigated in future studies.