Background

COPD, a leading cause of morbidity and mortality, is characterized by persistent airflow limitation and phenotypic heterogeneity. While cigarette smoking is a major risk factor for COPD, the response to cigarette smoke is highly variable [1]. Chronic bronchitis (CB) and emphysema represent two classic phenotypes of COPD [2]. However, CB, which is defined clinically by chronic cough and phlegm, can occur in the absence of COPD [3]. Some studies have suggested that CB and emphysema have different genetic determinants [4],[5]. CB has been reported to be associated with frequent respiratory exacerbations, increased respiratory symptoms, poor quality of life, and even increased mortality [6]-[8].

Although candidate gene testing and linkage analysis have been used to search for CB-related genetic determinants in selected populations [9],[10] and recently a genome-wide association meta-analysis has reported genetic variants associated with chronic mucus hypersecretion mainly in subjects from the general population [11], genome-wide association studies (GWAS) of CB within COPD subjects have not been reported. Our primary hypothesis was that genetic variants would be associated with COPD-related CB. We also hypothesized that genetic heterogeneity exists according to the presence or absence of CB within COPD subjects. We addressed these hypotheses by comparing COPD subjects with CB to smokers with normal spirometry and to COPD subjects without CB as control groups.

Methods

Study cohorts

Subjects were current and former smokers from three studies: the non-Hispanic whites (NHWs) from the COPDGene Study (NCT00608764 at, https://clinicaltrials.gov); GenKOLS (Bergen, Norway); and the Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints (ECLIPSE, NCT00292552 at, https://clinicaltrials.gov). All subjects had self-described European white ancestry. The design and procedures for each participating study have been previously described [12]-[14]. For supplementary analysis, the African Americans (AAs) of the COPDGene Study were included. Institutional review board approval was obtained at each participating clinical center; all subjects provided written informed consent. This study was approved by the Partners HealthCare Institutional Review Board (COPDGene, 2007P000554; GenKOLS, 2009P000790; ECLIPSE, 2005P002467).

Variable definitions

CB was defined as chronic productive cough for 3 months in each of 2 successive years [15]. CB COPD cases were defined as having both CB and COPD of at least spirometry grade 2 (post-bronchodilator FEV1/FVC <0.7 and FEV1 < 80% predicted), defined by the Global initiative for chronic Obstructive Lung Disease (GOLD 2-4) [16]. For CB COPD cases, primary analysis used current or former smokers with normal spirometry (post-bronchodilator FEV1/FVC ≥0.7 and FEV1 ≥ 80% predicted) as a control group. A secondary analysis was performed using COPD subjects without CB as controls to explore genetic heterogeneity within COPD subjects. Additionally, we performed GWAS of COPD subjects without CB relative to smoking controls for comparison to our results in COPD CB subjects (Figure 1). Additional variable definitions for complementary analyses are available in an online supplement.

Figure 1
figure 1

Genome-wide association study design for chronic bronchitis. Definition of abbreviations: CB = chronic bronchitis; COPD = chronic obstructive pulmonary disease; GOLD = Global initiative for chronic Obstructive Lung Disease. GOLD 2-4 was defined as having a post-bronchodilator FEV1/FVC < 0.7 and FEV1 < 80% predicted. Normal spirometry was defined as having a post-bronchodilator FEV1/FVC ≥ 0.7 and FEV1 ≥ 80% predicted.

Genotyping quality control and imputation

Genotyping was performed using Illumina platforms [HumanOmniExpress for the COPDGene cohort, the HumanHap 550 (V1, V3, and Duo) for the GenKOLS cohort, and HumanHap 550V3for the ECLIPSE cohort; Illumina, Inc., San Diego, CA]. Genotype imputation on the COPDGene cohorts was performed using MaCH [17] and minimac [18] using 1000 Genomes [19] Phase I v3 European (EUR) reference panels or cosmopolitan reference panels for NHWs and AAs, respectively. Details on genotyping quality control and imputation for the GenKOLS and ECLIPSE cohorts have been previously described [5],[14],[20]-[23]. If variants passed genotyping or imputation quality control in all three cohorts, they were included for analysis.

Statistical analysis

Logistic regression analysis of SNPs under an additive model of inheritance with case-control status as the outcome was performed in each cohort with adjustment for age, gender, pack-years of cigarette smoking and genetic ancestry-based principal components using PLINK 1.07 [24], as previously described [21]-[23]. Imputed genotypes were analyzed in a similar manner, using SNP dosage data in PLINK 1.07 [24]. We performed fixed-effects meta-analysis [25] using METAL (version 2011-3-25) [26] and R 2.15.1 (www.r-project.org) with the meta-package. Heterogeneity was assessed by calculating both I2[27] and P values for Cochran’s Q. In genomic regions with evidence of genetic heterogeneity, we also used a modified random-effects model optimized to detect associations under heterogeneity since the fixed effects model is based on inverse-variance-weighted effect size [28]. Genomic inflation factors [29] were calculated using GenABEL [30]. We used LocusZoom [31] to generate regional association plots, using the 1000 Genomes EUR reference data to calculate linkage disequilibrium (LD).

We used permutation testing [23] to assess differences in ORs of previous known genome-wide significant SNPs between two meta-analyses.

Results

GWAS of CB COPD relative to smokers with normal spirometry

Baseline characteristics of each of the three primary cohorts are summarized in Table 1.

Table 1 Baseline characteristics of COPD subjects with chronic bronchitis and smokers with normal spirometry as a control group

For the primary analysis of CB COPD relative to smokers with normal spirometry, the combined GWAS of three cohorts included 1,662 CB COPD cases and 3,520 controls. The quantile-quantile (Q-Q) plot showed no evidence of significant residual population stratification (Figure 2A; lambda = 1.03). Figure 3A shows a genome-wide significant association within the previously reported COPD susceptibility genome-wide significant region on chromosome 4q22.1 in FAM13A and a second genome-wide significant association in a novel region on 11p15.5. The results for the most significant SNPs at each of these loci are listed in Table 2. Figure 4 displays the regional association plots for these two regions. The top 12 SNPs in the meta-analysis were located on 4q22.1 (FAM13A) and either identical to, or in strong LD (r2 ≥ 0.97) with, the top SNPs previously described in GWASs of pulmonary function [32],[33] and COPD [21]-[23].

Figure 2
figure 2

The quantile–quantile plots for the three-cohort meta-analysis including 1000 Genomes project imputed data for (A) COPD subjects with chronic bronchitis (CB) versus smoking controls and (B) CB versus no CB within COPD subjects, after adjustment for age, sex, pack-years of cigarette smoking and genetic ancestry using principal components.

Figure 3
figure 3

Manhattan plots of –log 10 P values for meta-analysis of three cohorts for (A) COPD subjects with chronic bronchitis (CB) versus smoking controls and (B) CB versus no CB within COPD subjects, after adjustment for age, sex, pack-years of cigarette smoking and genetic ancestry using principal components.

Table 2 Top results of the meta-analysis for COPD subjects with chronic bronchitis versus smokers with normal spirometry in COPDGene non-Hispanic white, GenKOLS, and ECLIPSE studies *
Figure 4
figure 4

Local association plots for significant loci in the meta-analysis of cases with chronic bronchitis and COPD versus smoking control subjects in COPDGene non-Hispanic whites, GenKOLS, and ECLIPSE. A. rs2869967 on chromosome 4q22.1. B. rs34391416 on 11p15. The x-axis is chromosomal position, and the y-axis shows the –log10 P value. The most significant SNP at each locus is labeled in purple, with other SNPs colored by degree of linkage disequilibrium (r2). Plots were created using LocusZoom.

The novel locus on 11p15.5 encompasses a region where three genes are annotated: EF-hand calcium binding domain 4A (EFCAB4A), chitinase domain containing 1 (CHID1), and adaptor-related protein complex 2, alpha 2 subunit (AP2A2). The most significant SNP at this locus was rs34391416 (EFCAB4A), with a P value of 4.99 × 10-8. There was some evidence of heterogeneity (P = 0.01 for Cochran’s Q, I2 = 79). However, a meta-analysis using a modified random effects model showed more highly significant P values: P = 1.66 × 10-8 at rs34391416 (EFCAB4A), P = 7.56 × 10-9 at rs147862429 (CHID1), and P = 1.11 × 10-8 at rs143705409 (AP2A2).

GWAS of CB COPD relative to COPD subjects without CB

A GWAS of CB within COPD subjects from three studies included the same number of COPD CB cases and 3,777 COPD subjects without CB as a control group. Table 3 showed baseline characteristics of COPD subjects, and there is a corresponding Q-Q plot in Figure 2B (lambda = 1.01). We found a novel suggestive locus on 1q23.3, which did not reach genome-wide significant levels (rs114931935, P = 4.99 × 10-7, Table 4 and Figure 3B). This locus includes ribosomal protein L31 pseudogene 11 (RPL31P11) and activating transcription factor 6 (ATF6) (Figure 5).

Table 3 Baseline characteristics of COPD subjects with chronic bronchitis (CB) and those without CB within each cohort
Table 4 Top results of the meta-analysis for chronic bronchitis (CB) versus no CB within COPD subjects of COPDGene non-Hispanic white, GenKOLS, and ECLIPSE studies *
Figure 5
figure 5

Local association plots for the top two loci in the meta-analysis of COPD subjects with chronic bronchitis versus COPD subjects without chronic bronchitis in COPDGene non-Hispanic whites, GenKOLS, and ECLIPSE. A. rs114931935 on 1q23. B. rs924777 on 1q23. The x-axis is chromosomal position, and the y-axis shows the –log10 P value. The most significant SNP at each gene is labeled in purple, with other SNPs colored by degree of linkage disequilibrium (r2). Plots were created using LocusZoom.

Since the GWAS in COPD Gene NHWs for COPD with CB versus COPD without CB identified a genome-wide significant SNP, rs12692398 on 2p25.1, we performed a meta-analysis of two studies (COPDGene NHWs and GenKOLS), which also demonstrated the same SNP as a genome-wide significant SNP (Additional file 1: Table S1 and Additional file 1: Figure S1). It is located within cystin-1 (CYS1), encoding a cilia-associated protein. This SNP did not demonstrate evidence for association to CB within ECLIPSE COPD cases.

Complementary analyses

To explore whether our results were similar when including an additional racial group, a supplemental meta-analysis of four cohorts by adding AAs of COPDGene was performed for CB COPD relative to smoking controls. Additional file 1: Table S2 shows the baseline characteristics of AAs of COPDGene. The meta-analysis including 1,844 cases and 5,269 controls revealed similar results to those of three cohorts, with the exception of SNPs in both CHID and AP2A2, which were excluded because of their rarity in AA subjects (minor allele frequency < 0.01). The novel top SNP, rs34391416 (EFCAB4A), was genome-wide significant (OR = 1.93, P = 2.66 × 10-8).

Since CB was present in some of our smoking controls, a GWAS of CB COPD versus smoking controls without CB (n = 3,101) was performed for each of our three cohorts and then meta-analyzed. These results were similar, although the novel top SNP on 11p15.5 was slightly reduced in statistical significance (rs34391416, OR = 1.98, P = 6.50 × 10-8). However, a meta-analysis of four cohorts including AAs of COPDGene (n = 4,628) showed genome-wide significance of the same SNP (OR = 1.98, P = 2.76 × 10-8). Baseline characteristics of smoking controls without CB were summarized in Additional file 1: Table S3.

Because COPD subjects with CB were more likely to be current smokers, complementary meta-analyses were performed with adjustment for current smoking status as well as age, gender, pack-years of cigarette smoking and genetic ancestry-based principal components. In meta-analyses using smokers with normal spirometry as a control group, FAM13A SNPs remained genome-wide significant. One of the previously reported COPD risk loci, 15q25, was nearly genome-wide significant (P = 6.58 × 10-8). However, the novel SNP on 11p15 (rs34391416) was not genome-wide significant (P = 5.25 × 10-7 in three Caucasian cohorts and P = 2.60 × 10-7 in four cohorts including AAs, Additional file 1: Table S4). On the other hand, a meta-analysis using COPD subjects without CB as a control group, with adjustment for current smoking status, provided lower (but not genome-wide significant) P values of top SNPs from the secondary analysis of COPD with CB vs. COPD without CB (Additional file 1: Table S5).

We assessed the top SNPs of CB COPD susceptibility relative to smokers with normal spirometry (the primary meta-analysis) in the results of the secondary meta-analysis of CB vs. no CB within COPD subjects. The novel SNPs on 11p15.5 were nominally significant (P < 0.01), whereas SNPs near FAM13A were not significant (P > 0.1) (Additional file 1: Table S6).

Clinical and radiological characteristics were compared according to genotypes of rs34391416 among all COPDGene NHW subjects (Additional file 1: Table S7). There were significant differences in parameters related to airway disease, including airway wall area% on inspiratory chest CT scans and gas trapping on expiratory CT. There were no differences in emphysema severity or distribution related to this SNP.

Since the meta-analysis of CB COPD relative to smoking controls showed FAM13A as the top gene, we performed additional analyses to ascertain whether SNPs near FAM13A had different levels of statistical significance between COPD with CB and COPD without CB. A meta-analysis of GWASs for COPD subjects without CB relative to smoking controls (Figure 1) also showed FAM13A as the top gene, which was followed by HHIP and IREB2 (Additional file 1: Table S8 and Additional file 1: Figure S2). ORs and P values of previously known COPD risk alleles among our results from meta-analyses for CB COPD or COPD without CB are summarized in Table 5. Permutation testing revealed that differences of ORs between our two meta-analyses were statistically significant at four SNPs in FAM13A.

Table 5 Meta-analysis results of COPD with chronic bronchitis (CB) vs . smoking controls and COPD without CB vs . smoking controls for COPD risk alleles previously demonstrated in genome-wide association studies of COPD vs . smoking controls

Discussion

Our GWAS meta-analysis of three studies of COPD subjects with CB relative to smoking controls not only reconfirmed previously known genome-wide significant SNPs in FAM13A related to lung function [32],[33] and COPD [21]-[23], but also revealed a novel locus on 11p15.5, including EFCAB4A, CHID1, and AP2A2. Proteins encoded by one or more of these three genes could be involved in CB. Interestingly, this new region is located next to MUC6 and MUC2 (Figure 5) [34]. Thus, it is also possible that this genomic region influences regulation of mucin genes to alter susceptibility to CB.

EFCAB4A encodes a protein involved in store-operated Ca2+ entry. Intracellular Ca2+ was reported to regulate MUC2 expression [34],[35] and mucin secretion from airway goblet cells [36]. In addition, a study demonstrated increased intracellular Ca2+ levels in lymphocytes of COPD patients, which correlated positively with the spirometric grade of COPD [37]. Gene expression microarray analysis of human bronchial epithelial cells identified overexpression of EFCAB4A during mucociliary differentiation [38]. While quantitative RT-PCR revealed high expression of EFCAB4A in lung [39], a role for EFCAB4A in CB remains to be defined.

Even though SNPs near CHID1 in the meta-analysis of three GWASs did not show genome-wide significance, rs147862429 was the most genome-wide significant in a GWAS of COPDGene NHWs, with a P value of 2.90 × 10-10. CHID1 encodes a saccharide- and LPS-binding protein, also called stabilin-1 interacting chitinase-like protein (S1-CLP), with possible roles in pathogen sensing and endotoxin neutralization [40]. It is expressed in cells of monocytic, T lymphocyte, B lymphocyte, and epithelial origin, and it is up-regulated by the Th2 cytokine interleukin-4 and dexamethasone in macrophages [41]. Other human chitinase and chitinase-like proteins were previously suggested to play a role in the development of COPD [42]. Chitotriosidase (CHIT1) levels were elevated in the bronchoalveolar lavage fluid of smokers with COPD [43]. A chitinase-like protein, commonly known as YKL-40, was also increased in the lungs of COPD patients [44]. A recent study demonstrated genetic associations between chitinase gene variants and lung function level and rate of decline in COPD patients from the Lung Health Study [45]. Therefore, CHID1 may be involved in the pathogenesis of CB.

AP2A2 encodes adaptor protein complex 2 subunit alpha-2, which has been shown to participate in the endocytosis of clathrin-coated vesicles in interacting with epsin-1 [46] and receptor endocytosis with SHC-transforming protein 1 [47]. One study demonstrated long-range interactions between the MUC2 promoter and the adjacent AP2A2 gene by using quantitative chromosome conformation capture (q3C) [48]. Although human respiratory tract mucus contains mainly MUC5AC and MUC5B along with smaller amounts of MUC2, the distribution of MUC2 variable number tandem repeat (VNTR) alleles was reported to be different between asthmatics and non-asthmatics [49]. A follow-up study demonstrated relatively strong LD between SNPs in MUC2 and MUC5AC[50]. Therefore, AP2A2, either alone or through interactions with MUC2, may have a potential role in CB pathogenesis.

In the primary meta-analysis of CB COPD relative to smoking controls, we found the strongest signal within FAM13A rather than the other known COPD susceptibility genes, and permutation testing confirmed that ORs of FAM13A SNPs were significantly higher than those for non-CB COPD. While COPD is a complex disease with marked phenotypic heterogeneity, most previous genetic studies have dealt with COPD subjects as one homogeneous group [20]-[22]. The current study suggests that previously identified COPD risk alleles might have different effects on the development of different COPD subtypes.

Although our secondary meta-analysis of CB COPD relative to COPD without CB within the COPD population failed to demonstrate genome-wide significant SNPs, the fourth most significant SNP, rs2298019, was previously identified as an expression quantitative trait locus (eQTL) for ATF6 in lung tissue [51], with the risk allele associated with decreased expression. ATF6 plays a major role in transcriptional repression of endogenous cystic fibrosis transmembrane conductance regulator (CFTR) under endoplasmic reticulum stress [52] and is thought to be a potential therapeutic target for cystic fibrosis (CF) [53]. In addition to CF, suppressed CFTR function has been reported in cigarette smokers and COPD patients without CF [54],[55]. Recently, roflumilast, approved to reduce COPD exacerbations in COPD patients with CB, has been reported to activate CFTR [56]. Since ATF6 is closely connected with CFTR, genetic variants of ATF6 may play a role in the pathogenesis of CB.

We found that SNPs in another gene (CYS1) on 2p25.1 demonstrated suggestive associations for CB. CYS1 is enriched in the ciliary axoneme, and high expression in the kidney and weak expression in the lung were reported [57]. While the top SNP, rs12692398 of CYS1, reached the genome-wide significance threshold in both a GWAS of only COPDGene NHWs and a meta-analysis of COPDGene NHWs and GenKOLS, it lost significance in the meta-analysis of all three cohorts (P = 1.66 × 10-4). It is unclear why the association evidence for CB of this genomic region within ECLIPSE was negative. In the meta-analysis of three cohorts, the other two SNPs of CYS1, rs13000481 and rs4574084 showed P values of 1.74 × 10-5 and 2.61× 10-5, respectively, and LD between these two SNPs is high (0.94).

Our study has several limitations. First, we have not identified the functional genetic variants within our association regions. Nevertheless, we found significant differences in radiological parameters related to airway wall thickness according to genotypes of the novel top SNP, rs34391416, within COPDGene. These CT parameters have been frequently used as objective indicators of airway disease [6],[58]. Interestingly, there were no differences in emphysema severity or distribution according to this SNP genotype. Further studies will be required to identify the functional genetic variants within this region and to determine which gene that they influence. Second, we have not performed any independent replication, although this analysis was a meta-analysis of three GWASs. However, a supplemental meta-analysis of four cohorts by adding COPDGene AAs also showed similar results as those of three cohorts. Third, CB was present in some of our smoking controls. However, an additional meta-analysis of three GWASs of CB COPD versus smoking controls without CB showed similar results.

Conclusions

We have identified a novel locus on 11p15.5, which includes several biologically plausible candidates (EFCAB4A, CHID1 and AP2A2) as potential CB susceptibility genes. We have also found significantly increased effect sizes of FAM13A SNPs in COPD subjects with CB compared to those without CB. Although our secondary GWAS of CB versus no CB within COPD subjects did not show genome-wide significant SNPs, a locus including ATF6 should be explored for its related functional consequences. This study supports the concept that different genetic susceptibility contributes to phenotypic heterogeneity within COPD.

Additional file