Background

Tobacco smoke represents the most important COPD risk factor. Yet, not all smokers develop the disease during their lifetime [1]. The interaction between rare genetically determined alpha-1 antitrypsin (AAT) deficiencies and smoking on emphysema risk illustrates the relevance of genetic susceptibility and gene-environment interactions. The protease inhibitor AAT, encoded by SERPINA1, prevents the extracellular matrix degradation and destruction of alveoli by neutrophil. Both, tobacco smoke and AAT deficiency cause neutrophil recruitment in the lung. Pro-inflammatory and oxidative processes further diminish the anti-proteolytic AAT activity [2, 3].

Genome-wide association studies (GWAS) initially suggested that common SERPINA1 variants might influence COPD risk and associated lung function phenotypes. However, reported associations with common SERPINA1 single nucleotide polymorphisms (SNPs) were shown to reflect their linkage with more penetrant rare variants [4, 5]. In a recent GWAS, restricted to smokers, SERPINA1 variants were not associated with the level of lung function [6]. In the largest GWAS meta-analysis to date, a genetic risk score including 95SNPs, independently associated with lung function or COPD, did not contain SNPs from SERPINA1. Nevertheless, an over-representation of genetic variants related to elastic-fibre pathway was observed [7].

SERPINA1 gene is complex with eleven splicing isoforms differing in the SERPINA1 5’-UTR, tissue expression and secondary structure. This suggests that the gene’s pathophysiological role may be more strongly determined by differences in expression, regulation or posttranscriptional modification than by genetic variation [8]. A recent family-based study of predominantly smoking adults without severe AAT deficiency investigated SERPINA1 methylation. It was associated with COPD risk, forced expiratory volume in 1 s (FEV1) and the ratio of FEV1 to forced vital capacity (FEV1/FVC) [9] at two CpG sites.

Our candidate gene study is the first to investigate the influence of SERPINA1 methylation on lung function in a general population. Specifically, we tested the association of a comprehensive set of methylation signals in the SERPINA gene cluster with lung function levels; with 10 to 15-year lung function decline in adult smokers from three population-based European cohorts; and with lung function growth in tobacco-smoke exposed children from the ALSPAC birth cohort.

Methods

Study design

Cross-sectional and longitudinal analyses used data from three European adult cohorts and one European birth cohort with longitudinal data on lung function and DNA methylation (DNAm).

Ethics approval and consent to participate

All studies were approved by the local ethics committees and participants or their guardians provided written consent prior to taking part in the study.

Adult cohorts and participants

Participants came from three population-based studies: Swiss Study on Air Pollution and Lung and Heart Disease in Adults (SAPALDIA) [10, 11], European Community Respiratory Health Survey (ECRHS) [12] and Northern Finland Birth Cohort 1966 [13]. Participation across the studies included structured questionnaires, pre-bronchodilation spirometry, and blood sampling for DNA extraction and analysis. SAPALDIA and ECRHS share a harmonized respiratory health protocol. The study sample was restricted to ever smokers, aged ≥25 years, with data on valid lung function, relevant covariates, and DNA samples from two follow-ups subjected to methylome typing in the context of the Aging Lungs in European Cohorts (ALEC) project. The final sample size was 1076 (n = 561 SAPALDIA, n = 267 ECRHS, and n = 248 NFBC).

Children and adolescent cohort and participants

The Avon Longitudinal Study of Parents and Children (ALSPAC) consisted of 68 follow-up assessments between birth and 18 years [14]. Spirometry was performed at ages 8.5 and 15 years. Participants were restricted to children and adolescents with DNA methylome and valid lung function data from two time points, as well as information on relevant covariates. The study sample included 259 children exposed to tobacco-smoke (mother smoked during pregnancy and/or lived with a smoker and/or reported smoking ≥twice in their lifetime).

Lung function

Pre-bronchodilation spirometry was performed by trained personnel according to the ATS/ERS recommendations [15]. FEV1, FVC, and FEV1/FVC were the lung function parameters considered. In SAPALDIA, parameters were derived from 2001 and 2010 measurements and corrected for change in spirometers from SensorMedics to ndd EasyOne [16] in ERCHS they were derived from 1998 and 2008 measurements and were corrected from several spirometers to ndd EasyOne. In NFBC they were measures by Vitalograph P-model in 1997 and MasterScreen Pneumo spirometer in 2012. ALSPAC parameters were derived from 2000 and 2016 spirometries obtained from the same brand spirometer [17].

DNA methylation measurement

DNAm of autosomes obtained at two time points of lung function measurements was the predictor of interest. DNA was extracted from peripheral whole blood in all cohorts. In SAPALDIA and ALSPAC, DNAm was measured at both time points using the Infinium HumanMethylation450K BeadChip (Illumina, Inc.), in NFBC using the Infinium HumanMethylation450K BeadChip at the first time point and the EPIC BeadChip at the second time point and in ECRHS, at both time points using the EPIC BeadChip.

For ECRHS and SAPALDIA, randomized distribution of samples for bisulphite conversion was applied and for methylome typing batches, the samples from each time point from the same person were placed next to each other on the array. In NFBC1966 DNAm data were recorded in two batches following the clinical assessments of participants aged 31 and 46.

The methylation level (β value) was derived from raw intensities after pre-processing using R package using minfi [18] followed by beta-mixture quantile normalization (BMIQ) [19] in SAPALDIA or RnBeads [20] followed by quantile normalization (QN) in ECRHS. In NFBC, CPACOR pipeline [21] was used to pre-process and prepare β values. In all adult cohorts, normalized beta scores were regressed on principle components derived from the array control probes reflecting technical bias. The resulting residuals were used as predictors in the associations with lung function.

In ALSPAC, analogous standard quality control procedures were applied, in addition, genotype probes on the HumanMethylation450K were compared between samples from the same individual and against SNP-chip data to identify and remove any sample mismatches. Data were pre-processed in R (version 3.0.1) with the WateRmelon package according to the subset quantile normalization approach [22] to reduce the non-biological differences between probes. Technical batch effect for each methylation time-point was adjusted for by including ten surrogate variables into the models.

Methylation signals considered in SERPINA gene cluster

The human serine protease inhibitor (serpin) gene cluster is located at 14q32. It consists of eleven functionally diverse serpin genes within a region of approximately 400 kb in length. Gene sub-clusters consist of four, three, and four genes each. The best characterized and proximal sub-cluster with a length of about 107 kb includes SERPINA1 as well as an antitrypsin-related pseudogene (ATR, SERPINA2;∼13 kb downstream), the corticosteroid-binding globulin gene (CBG, SERPINA6;∼68 kb downstream), and the protein Z inhibitor gene (ZPI, SERPINA10;∼100 kb downstream).

Because the tissue-specific expression of different genes in the serpin cluster is regulated by chromosomal elements and chromatin structure, and in the absence of knowledge about the relevance of more distal methylation signals on gene expression, we included in the analysis all 119 CpGs located 99 kb downstream (PPP4R4) and 376 kb upstream (GSC) from the SERPINA1 gene. The CpGs were allocated to 12 genes: PPP4R4, SERPINA10, SERPINA6, SERPINA1, SERPINA11, SERPINA9, SERPINA12, SERPINA4, SERPINA5, SERPINA3, SERPINA13, and GSC (Fig. 1: location of CpG sites considered; Fig. 2: correlation of methylation at different CpGs at both time points).

Fig. 1
figure 1

Chromosome 14 and SERPINA gene cluster located between PPP4R4 and GSC genes

Fig. 2
figure 2

a-b Heatmaps for correlation of methylation at 119 CpGs in SERPINA gene cluster in SAPALDIA. Heatmaps correlations at the first (a; T1) and second (b; T2) time-points. CpGs located on the SERPINA1 gene are highlighted in black and labeled. Both figures were created with the R software

Covariate information

Information on participant’s age, sex, education, height, and smoking status was derived from questionnaires administered during the two time-points. Cell proportions in the respective blood samples were estimated using the Houseman method [23] implemented in the minfi package [18].

For sensitivity analysis, SAPALDIA provided information on concentrations of high-sensitive C-reactive protein (CRP) and AAT in blood samples collected at the first time point. CRP and AAT concentrations were measured by latex-enhanced immunoturbidimetric assays (Roche Cobas Integra analyzer; Roche Diagnostics). Inter-assay coefficients of variation were below 5% and lower detection rate were 0.1 mg/l (CRP) and 0.21 (AAT) [24]. In addition, SAPALDIA provided genetic information on rare AAT deficiency variants PiZ and PiS [4].

Statistical analyses

Cohort-specific analyses on lung function and its decline in adults

Within each adult cohort, four statistical models were run: a) cross-sectional associations of DNAm with lung function at T1, b) cross-sectional associations of DNAm with lung function at T2, c) repeat cross-sectional associations of DNAm and lung function at both time points, and d) predictive associations of DNAm at T1 on annual lung function decline between T1 and T2, calculated as (lung function at T2 - lung function at T1 divided by the time of follow-up in years).

The absolute levels of lung functions (FEV1, FVC, FEV1/FVC) were regressed on residuals of methylation by fitting linear regression models and a mixed-model with a random effect for the subject (repeat cross-sectional model), respectively. Sample size was kept constant across all models and analyses. Models were a priori adjusted for the following covariates associated with lung function level at P-value < 0.05: study center, age, age2, education, height, (height-mean(height))2, sex, sex*age, sex*age2, sex*height, sex*(height-mean(height))2, and cell composition (CD8T; CD4T; NK; Bcell; Mono and Eos).

Meta-analysis of adult cohort results

The fixed-effect meta-analysis weighted by the inverse of variance was completed using METAL on the 119CpG sites common to all three cohorts [25]. Associations with P-values< 0.05 were considered as nominally significant. Since methylation of CpGs in the SERPINA cluster did not correlate throughout the whole genetic region (Fig. 2), the total number of CpGs was used as correction for multiple testing. A Bonferroni corrected P-value< 4.2 × 10− 4 was considered statistically significant. Given the generally medium-to-high correlation between lung function parameters, associations of CpGs with different lung function parameters were not considered independent [7].

Cohort-specific analysis on lung function and its growth in children and adolescents

Four regression models equivalent to those assessed in adult cohorts were run to investigate cross-sectional and predictive associations of methylation at 119 CpGs in the SERPINA gene cluster with the absolute level and increase in FEV1, FVC, and FEV1/FVC in children and adolescents. The models were a priori adjusted for covariates associated with lung function at P-value< 0.05: study center, age, mother’s education, height, (height-mean(height))2, sex, sex*age, sex*height, and estimated cell composition (CD8T; CD4T; NK; Bcell; Mono; Eos) [23].

Sensitivity analyses

Seven sensitivity analyses were conducted in SAPALDIA:(1) 119 CpGs from T1 were regressed on circulating AAT measured at T1 [24] by including the same set of covariates as in cross-sectional models on lung function at T1;(2) cross-sectional models at T1 were additionally adjusted for CRP and AAT;(3) for all phenotypes, all four statistical models were additionally adjusted for the presence of PiS and PiZ: 0, 1 or 2 alleles;(4) for neutrophils;(5) stratified by gender;(6) stratified by obesity (BMI 30 < vs. ≥ 30 kg/m2;(7) and stratified by self-reported asthma (“Have you ever had asthma?”).

Results

Characteristics of adult cohort participants

Characteristics of the adult participants at the first time point (T1) and second time point (T2) are presented in Table 1. SAPALDIA participants were oldest, reported the highest number of pack-years, had lung function levels similar to ECHRS, but lower than NFBC, and exhibited the highest prevalence of COPD based on pre-bronchodilation spirometry and defined as FEV1/FVC below the lower limit of normal: in SAPALDIA 15.3% (T1) and 17.1% (T2), respectively, compared to 10.5% (T1) and 12.7% (T2) in ECHRS and 2.2% (T1) and 11.4% (T2) in NFBC. The prevalence of self-reported doctor’s diagnosed asthma was between 10 and 20%, increasing with aging in all three cohorts.

Table 1 Population characteristics of adult cohorts

Association of methylation at 119 CpGs in the SERPINA1 cluster with lung function in adult ever smokers

SERPINA1

Of the 119 CpGs, 17 were located in the SERPINA1 gene. DNAm at these 17 sites was not associated with any of the three lung function parameters at a Bonferroni-corrected P-value, irrespective of the model considered (Tables 2 and 3; Additional file 1: Table ES1). However, for FEV1, meta-analysis revealed three nominally significant associations at T1 (cg09968361, cg25968219, cg04179148) (Table 2). The positive association of cg09968361 with levels of FEV1, nominally significant at T1 (β:1.43, P-value = 0.04) was consistent in direction for T2 and the repeat cross-sectional analysis (β:0.72, P-value = 0.02). However for change in lung function, an increase in methylation at this CpG site was associated with accelerated FEV1 decline (β:-0.10, P-value = 0.011). For FEV1/FVC (Table 3), meta-analysis also revealed three nominally significant associations at T1 (cg25968219, cg24621042, cg04179148), which in part overlapped with those associated with FEV1, also in terms of direction of association. No association of methylation at SERPINA1 CpGs and FVC was observed (Additional file 1: Table ES1).

Table 2 Results from Meta-Analyses, FEV1 and AAT, Adult Cohorts
Table 3 Results from meta-analysis, FEV1/FVC and AAT, adult cohorts

SERPINA gene cluster

Results from the meta-analysis on the association of methylation at the 119 CpGs with cross-sectional lung function and lung function decline for FEV1, FVC and FEV1/FVC are presented in Additional file 1: Tables ES2-S4. A single CpG at cg08257009, located 32 kb downstream of SERPINA1, withstood Bonferroni-correction for multiple testing. Methylation at this site was positively associated with FEV1/FVC in the repeat cross-sectional analysis (β:0.11; P-value = 2.6 × 10− 4) (Additional file 1: Table ES3). The associations of this signal for the two cross-sectional time points were comparable (β:0.10; P-value = 0.01 at T1; β:0.14; P-value = 2.2 × 10− 3 at T2). Consistent with the observation that higher methylation at this site was associated with higher level of FEV1/FVC cross-sectionally, it also predicted attenuated decline of FEV1/FVC (β:0.0089; P-value = 9.1 × 10− 3).

Association of methylation at 119 CpGs with circulating alpha-1 antitrypsin in adult ever smokers in SAPALDIA

None of the associations between methylation and circulating AAT in SAPALDIA participants reached multiple-testing-corrected statistical significance, but nominal statistical significance was observed at four CpG sites in the SERPINA1 gene (Tables 2 and 3) and at two additional sites outside SERPINA1 (Additional file 1: Tables ES2-S4), one of which was cg08257009, the only lung function associated signal withstanding multiple testing. In all instances, higher methylation was associated with lower AAT concentrations. These inverse associations with circulating AAT did not translate into statistically significant and inverse associations with any of the lung function parameters measured.

Association of methylation at 119 CpGs with lung function in tobacco-smoke exposed children and adolescents

Characteristics of ALSPAC children and adolescents from the first (T1) and second (T2) time points are presented in Table 4. Height and weight increased during follow-up from a mean of 133 to 170 cm and from 31 to 63 kg, respectively. Twenty-two and 29% of children and adolescents had asthma and at the second time point, the majority (86%) of adolescents reported intake of asthma medication in the last 12 months. While FEV1 and FVC increased during follow-up, on average, FEV1/FVC remained stable.

Table 4 Characteristics of tobacco-smoke exposed children and adolescentsa, ALSPAC

SERPINA1

Focusing on the methylation sites in SERPINA1 (Additional file 1: Tables ES5-S7), again none of the signals reached Bonferroni-corrected significance in association with any lung function parameter or model. There were several nominally statistically significant associations, particularly for lung function growth (FEV1: cg26938334; FVC: cg26938334, cg13826459, cg 24,621,042; FEV1/FVC: cg10070185). Methylation at cg26938334 was inversely associated with growth in FEV1 and FVC, and methylation at cg24621042 and cg10070185 was inversely associated with growth in FVC and in FEV1/FVC, respectively. The associations of cg24621042 with FEV1/FVC change were inconsistent with those observed for FEV1/FVC decline in adults.

SERPINA gene cluster

Results from meta-analysis on the association of methylation at all 119 CpGs with cross-sectional lung function and lung function growth of FEV1, FVC and FEV1/FVC are presented in Additional file 1: Tables ES5-S7. No methylation signals remained statistically significant after Bonferroni correction. Methylation at cg08257009, 32 kb downstream of SERPINA1 gene, which showed statistically significant positive association with FEV1/FVC cross-sectionally in the adult cohort, was not associated with FEV1 or FEV1/FVC in children. However, there was evidence for an inverse, rather than a positive association with FVC at T1 (β:-0.91, P-value = 4.4 × 10− 3, Additional file 1: Table ES6). Overall, none of the SERPINA1 CpGs showed consistent associations across cross-sectional and longitudinal models.

Replication of previously reported COPD signals

Relative hypomethylation of cg02181506 and cg24621042 were previously associated with COPD [9]. Table 5 summarizes the association of both signals with circulating AAT in SAPALDIA and with lung function and its change in adults as well as children. Contrary to expectation, relative hypermethylation, not hypomethylation at these two sites was associated with lower circulating AAT. For cg02181506, irrespective of age, no association of methylation with either FEV1/FVC or FEV1 was observed. For cg24621042, in adults, hypermethylation was positively associated with FEV1/FVC at T1 only, but not in the repeat cross-sectional analysis, and the inverse association of methylation at cg24621042 and FVC change was only observed in children. No lung function associations reached statistical significance at the nominal P-value< 0.05.

Table 5 Association of previously reported COPD associated cg02181506 and cg2462102

Sensitivity analyses in the SAPALDIA cohort

In the absence of consistent CpG and lung function associations, we restricted sensitivity analyses to the FEV1/FVC association between the CpG withstanding multiple testing (cg08257009) and the two CpGs (cg02181506 and cg24621042) previously associated with COPD [9] (Table 6, Additional file 1: Tables ES8-S9). Sensitivity analyses did not reveal a more consistent pattern of associations between methylation at these three sites and lung function.

Table 6 Sensitivity Analyses of Association between FEV1/FVC and Methylation at cg02181506, cg2462102, and cg08257009, SAPALDIA Cohort

Discussion

This is the first study to investigate the association of DNAm in the SERPINA gene cluster with lung function and its longitudinal change in ever-smoking adults from three European population-based adult cohort studies and in tobacco-smoke exposed children and adolescents from England. We observed methylation at cg08257009 not annotated to a gene in the SERPINA gene cluster, located 32 kb downstream of the SERPINA1 gene to be significantly associated with FEV1/FVC at the Bonferroni-corrected level. No methylation signals in the SERPINA1 gene showed associations with lung function level or change over time after correcting for multiple testing.

Few studies have reported associations between DNAm and lung function or COPD. Our results obtained in general population samples contradict the previous finding of hypomethylation at two CpG sites in SERPINA1 and COPD risk in two family-based studies. The first of these studies was restricted to smokers and the second consisted of participants with and without a history of smoking [9]. The functional relevance of the two CpG sites remains unclear, given that the associations of hypomethylation with COPD risk and circulating AAT are inconsistent in direction. Consistent with the inverse association observed with AAT in the blood, hypomethylation of the AAT gene has been associated with increased gene expression in rat models [26].

Our results are consistent with the results from a study comparing gene methylation in lung tissue of former smoker COPD patients and controls with normal lung function [27, 28]. Methylation in SERPINA1 was not associated with COPD risk. Similarly no cross-sectional association with post-bronchodilation lung function was observed for SERPINA1 gene methylation measured in blood samples from a small rural Korean COPD case-control study [29]. Furthermore, in a study of middle-aged monozygotic twins, DNAm in SERPINA1 was not associated with intra-pair differences in lung function decline [30].

Smoking exerts strong adverse effects on lung function and increases COPD risk. In a small proportion of smokers, it interacts with genetically determined rare AAT deficiency to cause COPD. The adverse respiratory effects on the lung tissue may therefore be mediated by altering DNAm in SERPINA1. Yet, the largest epigenome-wide association study for smoking did not identify a relevant association between epigenetic signatures in SERPINA1 and smoking [31]. Interestingly, the cg08257009 which reached Bonferroni corrected significance in the repeat cross-sectional analysis was previously reported as a smoking-related CpG in buccal cells in epithelial cancer [32]. In addition, cg08257009 and cg02181506 were reported to be smoking related signals in blood-derived DNA samples from participants in 16 cohorts, with cg02181506 showing weaker associations [31].

The use of blood instead of lung tissue to assess lung disease related methylation may be considered a limitation of the study. However as previously discussed, COPD is a systemic disease related to low grade inflammation, which supports studying the peripheral blood methylome [9]. Teschendorff and others demonstrated the correlation between smoking related DNAm in blood and lung tissue as well as between normal and malignant lung tissue. Their results also pointed to the prognostic value of these signals in lung cancer patients [32,33,34].

Additional limitations of the study include the lack of cross–omics data to investigate the biological network related to AAT more comprehensively. This is important, given the fact that the most recent lung function GWAS suggests an important role of elastic-fibre pathways [7]. Furthermore, misclassification of smoking exposure could have biased the observed associations, most likely towards the null.

The advantages of the current study are several fold. First, the studies providing data for this investigation are well established respiratory population-based cohorts. They are known for high quality testing of lung function and large sample sizes. They share similarities in study protocols. Methylome analyses followed stringent quality control. Second, the study is based on longitudinal lung function data and prospectively and repeatedly measured DNAm. Repeated assessment of lung function and predictors may have helped improve the statistical power of the cross-sectional analysis. Prospective assessment of DNAm with change in lung function decreases the problem of reverse causation. Third, the integration of lung function from both adults and children offered the opportunity to study SERPINA1 methylation and lung function over the life course and investigate whether relevant time windows for genome-environment interactions may exist.

Conclusion

In conclusion, this first comprehensive study on DNAm in the SERPINA gene cluster provides weak evidence of an association with lung functions and its change across the life course. Larger studies based on post-bronchodilation lung function need to be followed by investigating associations of SERPINA1 and elastin-related pathways and networks through cross–omics approaches.