Background

Chronic Obstructive Pulmonary Disease (COPD) is a progressive inflammatory lung disease characterized by persistent airway obstruction that causes severe respiratory symptoms and poor quality of life [1]. Although smoking is generally considered the main environmental risk factor, estimations are that 25–45% of patients with COPD have never smoked [2]. Despite extensive research, the etiology of COPD remains incompletely understood. It is known that the development of this complex heterogeneous disease is influenced by both genetic and environmental factors, as well as their interactions [3,4,5,6]. As interface between the inherited genome and environmental exposures, an important role has been postulated for the epigenome [7]. The epigenome includes multiple epigenetic mechanisms that affect gene expression without modifying the DNA sequence. These epigenetic mechanisms are highly dynamic and respond to environmental exposures, ageing and diseases [8]. One such epigenetic mechanism is DNA methylation, which involves the binding of a methyl group to a cytosine base located adjacent to a guanine base. Methylation of these so called CpG-sites in regulatory regions of the DNA generally result in decreased expression of a particular gene [9].

So far, only a few studies have investigated the association between DNA methylation in peripheral blood and COPD or lung function using an epigenome-wide hypothesis free approach [10,11,12,13,14,15,16,17]. Although findings across the studies are not consistent, there is suggestive evidence that alterations in DNA methylation might play a role in the etiology of COPD. However, in previous studies, subjects were mainly included irrespective of smoking status, thus including current smokers, ex-smokers and never smokers. As a consequence, it is currently not known if there are differences in DNA methylation between healthy individuals and patients with COPD who have never smoked. Recently, we studied the association between epigenome-wide DNA methylation and COPD in both current smokers and never smokers [16]. Although we did not find any epigenome-wide significant association in current smokers nor in never smokers, the associations between DNA methylation and COPD were different between both groups. Hence, by further exploring the role of DNA methylation in a much larger set of never smokers together with a continuous measurement of lung function, we might be able to reveal important novel insights in the etiology of COPD. In this study, we aim to assess the association between DNA methylation and lung function in never smokers, meta-analyzing four independent population-based cohorts.

Methods

Study population

To study the association between epigenome-wide DNA methylation and lung function, defined as the ratio between the Forced Expiratory Volume in 1 s (FEV1) and Forced Vital Capacity (FVC), in never smokers, we performed a meta-analysis in four different cohorts. Two cohorts originated from the LifeLines population-based cohort study [18]: the LifeLines COPD & Controls DNA methylation study [16, 19] (LL COPD&C, n = 903) and the LifeLines DEEP study [20] (LLDEEP, n = 166). The two other cohorts originated from the population-based Rotterdam study (RS) [21]: The first visit of the third RS cohort (RS-III-1, n = 150) and a cohort selected for the Biobank-based Integrative Omics Studies (BIOS) project (RS-BIOS, n = 206). Both population-based cohort studies were approved by the local university medical hospital ethical committees and all participants signed written informed consent. In all cohorts, never smoking was defined based on self-reported never smoking history and 0 pack years included in the standardized questionnaires.

Measurements

Lung function

Within the LifeLines population-based cohort study, pre-bronchodilator spirometry was performed with a Welch Allyn Version 1.6.0.489, PC-based Spiroperfect with CA Workstation software according to ATS/ERS guidelines. Technical quality and results were evaluated by well-trained assistants and difficult to interpret results were re-evaluated by a lung physician. Within the population-based Rotterdam study, pre-bronchodilator spirometry was performed during the research center visit using a SpiroPro portable spirometer (RS-III-1) or a Master Screen® PFT Pro (RS-BIOS) by trained paramedical staff according to the ERS/ATS Guidelines. Spirometry results were analyzed by two researchers and verified by a specialist in pulmonary medicine.

DNA methylation

In all four cohorts, DNA methylation levels in whole blood were determined with the Illumina Infinium Methylation 450 K array. Data was presented as beta values (ratio of methylated probe intensity and the overall intensity) ranging from 0 to 1. Quality control has been performed for all datasets separately as described before [19, 22]. After quality control, data was available on 396,243 CpG-sites in all four datasets.

Statistical analysis

Epigenome-wide association study and meta-analysis

We performed an epigenome-wide association study (EWAS) on lung function defined as FEV1/FVC in all four cohorts separately using robust linear regression analysis in R. The analysis was adjusted for the potential confounders age and sex. To adjust for the cellular heterogeneity of the whole blood samples, we included proportional white blood cell counts of mononuclear cells, lymphocytes, neutrophils and eosinophils, obtained by standard laboratory techniques. For LL COPD&C, we adjusted for technical variation by performing a principal components analysis using the 220 control probes incorporated in the Illumina 450 k Chip. The 7 principal components that explained > 1% of the technical variation were included in the analysis. For LLDEEP, data on technical variance was not accessible. For the two RS cohorts, we included the position on the array and array number to adjust for technical variation. Regression estimates from all four individual EWA studies were combined by a weighted by the inverse of the variance random-effect meta-analysis using the effect estimates and standard errors in “rmeta” package in R. CpG-sites with a p-value below 1.26 × 10^− 7 (Bonferroni corrected p-value by number of CpG-sites 0.05/396243) were considered epigenome-wide significant. CpG-sites with a p-value below 0.0001 in the meta-analysis were defined as top associations in our study.

Expression quantitative trait methylation (eQTM) analysis

To assess whether top associations were also associated with gene expression levels, we used the never smokers included in the Biobank-based Integrative Omics Studies (BIOS). For all cohorts separately, reads were normalized to counts per million. To adjust for technical variation for gene expression and DNA methylation, principal component analysis was conducted on the residual normalized counts and beta-values excluding the potential confounders age and gender. Principal components that explained more than 5% of the technical variation in gene expression or DNA methylation were included in the analysis. Subsequently, robust linear regression analysis was performed on the CpG-sites and the genes within 1 MB around the CpG-sites. The analyses were adjusted for the potential confounders age, sex and technical variation by principal components as stated before. The individuals eQTM analysis were combined by a random-effect meta-analysis using the effect estimates and standard errors in RMeta. An eQTM was considered significant when the Bonferroni-adjusted p-value for the number of genes within 1 MB around the CpG-sites was below 0.05.

Results

Subject characteristics

An overview of the characteristics of the subjects included in the study is shown in Table 1. LL COPD&C was the largest cohort included in this meta-analysis. Notably, since this cohort is a non-random selection from the LifeLines cohort study with COPD (defined as FEV1/FVC < 0.70) as one of the selection criteria, the percentages of COPD cases should not be interpreted as prevalence.

Table 1 Subject characteristics of the subjects from the four different DNA methylation datasets

Meta-analysis of the four epigenome-wide association studies

The meta-analysis of the four different cohorts did not reveal CpG-sites that were epigenome wide significantly associated with FEV1/FVC. We identified 36 CpG-sites as our top associations (Table 2). The Manhattan plot of the meta-analysis is shown in Fig. 1a. Forest plots of the three most significant CpG-sites cg10012512, located in the intergenic region of chromosome 7q36.3 (p=5.94 × 10^− 7), cg02285771, annotated to LTV1 Ribosome Biogenesis Factor (LTV1) (p=4.10 × 10^− 6) and cg25105536, annotated to Kelch Like Family Member 32 (KLHL32) (p=9.09 × 10^− 6) are shown in Fig. 1b-d. An overview of all CpG-sites associated with FEV1/FVC at nominal p-value of 0.05 can be found in Additional file 1: Table S1.

Table 2 Results of the meta-analysis and individual EWA studies on FEV1/FCV in never smokers
Fig. 1
figure 1

Manhattan and forest plots of the meta-analysis on four independent epigenome-wide association studies on FEV1/FVC in never smokers. a Manhattan plot in which every dot represents an individual CpG-site. Location on the X-axis indicated the chromosomal position and location on the Y-axis indicates the inversed log [10] p-value of the meta-analysis. Dotted horizontal line indicates p-value of 0.0001, horizontal fixed line indicates epigenome-wide significance (p-value < 0.05/396,243 = 1.26 × 10^− 7). b-d Forest plots showing the effect estimates and standard errors of the 4 independent EWA studies and meta-analysis for the top hits cg10012512 (b), cg028885771 (c) and cg25105536 (d)

The direction of the effect of the 36 top CpG-sites did not change in a sensitivity analysis in the LL COPD&C cohort excluding the subjects that were exposed to environmental tobacco smoke (ETS)(N=659 subjects) (Additional file 2: Table S2).

Expression quantitative trait methylation (eQTM) analysis

In total, 803 genes were located within 2 MB of the 36 CpG-sites. The expression of 11 genes was significantly associated with DNA methylation levels at the 9 different CpG-sites (Table 3). DNA methylation at cg25105536, annotated to KLHL32, was significantly associated with gene expression levels of KLHL32. DNA methylation levels at cg08065963, located in the intergenic region on chromosome 16 and not yet annotated to a gene, showed a significant association with gene expression levels of 4-Aminobutyrate Aminotransferase (ABAT). For the other 7 CpG-sites, DNA methylation levels were associated with gene expression levels of one or two genes other than the previously annotated genes. An overview of the association between DNA methylation and gene expression levels of all genes can be found in Additional file 3: Table S3.

Table 3 Overview of the results of the meta-analysis of the eQTM analysis

Discussion

This study is the first large general population-based EWA study on lung function in never smokers. So far, virtually all EWA studies on the origin of COPD included subjects with a history of cigarette smoking. As a consequence, these studies mainly addressed the origins of COPD in response to smoking. It is unclear if the results of these studies help to explain the etiology of COPD or rather explain the contribution of cigarette smoke towards the disease. Therefore, our study importantly contributes to the current understanding of COPD in never smokers.

We identified 36 CpG-sites that were significantly associated with FEV1/FVC at p-value below 0.0001. The top hit of our meta-analysis, cg10012512, is located in the intergenic region of chromosome 7q36.3. It is therefore not possible to speculate on the functional effect of differences in DNA methylation at this specific CpG-site and how these differences may affect FEV1/FVC. While associations found with an eQTM analysis may help to get more insight in the function of a CpG-site, our eQTM analysis did not reveal any nominal significant associations for cg10012512. However, this CpG-site was differentially methylated between never smokers and current smokers [23]. Presumably, this CpG-site does also respond to other inhaled deleterious substances, which in turn affects lung function. The second top hit, cg02885771 located on chromosome 6q24.2 is annotated LTV1. Previously, this CpG-site has been associated with asthma in airway epithelial cells [24] and LTV1 was shown to be expressed in lung tissue in the Genotype Tissue Expression (GTEx) project. Although studies in yeast describe LTV1 as a conserved 40S-associated biogenesis factor that functions in small subunit nuclear export, a specific role for LTV1 in respiratory diseases is not known [25]. The third top hit, cg25105536, is annotated to KLHL32 on chromosome 6q16.1 and we found a significant association between DNA methylation levels of cg25105536 and gene expression levels of KLHL32. The function of KLHL32 is poorly understood, however, four genetic variants in the KLHL32 gene have been associated with FEV1 and FEV1/FVC in African American subjects with COPD and a history of smoking [26]. Notwithstanding the fact that these associations were only identified in a specific group, it might suggest a role for KLHL32 in the respiratory system. Next to KLHL32, we found that gene expression levels of 10 additional genes were significantly associated with DNA methylation levels at one of the 36 CpG-sites. cg08065963, which was not yet annotated to a gene, was significantly associated with 4-Aminobutyrate Aminotransferase (ABAT). Interestingly, a role for ABAT in COPD has not been described before. The remaining nine genes were other genes than the annotated genes of the particular CpG-sites. This suggest that the CpG-sites may also regulate distant genes within a region of 2 MB, which complicates the functional assessment of differences in DNA methylation even further.

To the best of our knowledge, there are eight studies in literature describing the association between DNA methylation and lung function (Table 4). Six of these studies included both subjects with and without a history of cigarette smoking and, except for the study by Qui et al., adjusted for smoking status in the statistical analysis. In addition, the recent study by Imboden et al. performed analyses with and without adjustment for smoking status and pack years. Altogether, these seven studies identified 462 unique CpG-sites. Interestingly, none of the 36 CpG-sites from our meta-analysis in never smokers were among these 462 previously identified CpG-sites (Table 5). Apparently these 36 CpG-sites are only associated with lung function level in never smokers. The fact that 17 CpG-sites (47%) were associated at nominal p-value < 0.05 with COPD (dichotomously defined as the ratio of FEV1/FVC below 70%) in our previously EWAS stratified for never smoking, further underscores this assumption [16]. There is, however, one exception, since cg22742965, annotated to Transmembrane Protein With EGF Like And Two Follistatin Like Domains 2 (TMEFF2), was also significantly associated with COPD in smokers. Most likely, this CpG-site shows a general response to inhaled deleterious substances such as cigarette smoke and other yet unknown substances.

Table 4 Overview of studies reporting results of differential DNA methylation with lung function or COPD in whole blood
Table 5 Overview of CpG location, gene annotation, gene function and literature comparison of the top 36 CpG-sites of the meta analysis

Assuming that the observed differential DNA methylation at the majority of the CpG-sites in our study occurs without exposure to smoking, the question arises why this differential DNA methylation is observed. One possible explanation may be that other factors within the environment such as air pollution and job-related exposures are responsible for the observed differences in DNA methylation. Recently, we studied the epigenome-wide association between DNA methylation and exposure to air pollution and job-related exposures in a selection of the LifeLines population cohort including both never and current smokers [19, 27]. While we did find significant associations, none of them were replicated in independent cohorts. Additional analyses in never smokers for this paper did not reveal novel associations between DNA methylation and environmental exposures (Additional file 4: Table S4 and Additional file 5: Figure S1). This might potentially be due to lack of power, since only a small percentage of the subjects that have never smoked in the LL COPD&C cohort have been exposed to environmental exposures. Moreover, exposure levels to air pollution in the LL COPD&C are relatively low compared to the average Dutch levels determined within the 2012 Dutch national health survey as described by Strak et al [28]. Next to environmental exposures, another explanation may be that a reduced lung function level precedes the differences in DNA methylation. However, with the cross-sectional design of this study, we cannot derive conclusions on the direction of the association and causality. Large longitudinal studies are required to investigate causality between DNA methylation and FEV1/FVC. Moreover, this will give the opportunity to investigate if low levels of FEV1 and decline in FEV1 over the years is associated with DNA methylation in never smokers.

Conclusions

With this study we show that epigenetics indeed may be associated with FEV1/FVC in subjects who never smoked. Moreover, since 35 out of the 36 identified CpG-sites are unique for never smokers, our data suggest that factors other than smoking affect FEV1/FVC via DNA methylation.