Background

The presence of respiratory symptoms, such as chronic cough, dyspnea and phlegm, is associated with lower lung function [1, 2] and with mortality due to several causes of death [35]. Respiratory symptoms have been regarded as important markers of accelerated lung function decline [6, 7] and development of asthma [8].

It is known that cigarette smoking [9], allergy [10, 11], air pollution [12, 13] and occupational exposures [14, 15] are risk factors for respiratory symptoms. However, not all exposed subjects develop respiratory symptoms, which suggests that a genetic component may be involved in the development of respiratory symptoms. Previous studies reported associations between respiratory symptoms and specific genetic loci using candidate gene studies [16, 17]. To date, only one genome-wide association (GWA) study has investigated genetic susceptibility of respiratory symptoms (i.e. Chronic mucus hyper-secretion) [18]. Genetic susceptibility to develop respiratory symptoms such as cough, dyspnea, and phlegm has not been studied up until now using GWA methods.

In the current study, we conducted several GWA analyses, i.e. on cough, dyspnea and phlegm, in 7,976 Caucasians of Dutch descent from the large population-based LifeLines I cohort study to identify common genetic variants associated with respiratory symptoms. We used the LifeLines II cohort and the Vlagtwedde-Vlaardingen cohort to replicate our initial findings.

Methods

Identification cohort

Genotyped individuals from the first data release of the LifeLines cohort study (2006–2011, LifeLines I) with full data on all covariates were included (n = 7,976). The LifeLines cohort study is a prospective population-based cohort studying health and health-related behavior of subjects from the three Northern provinces of the Netherlands [19, 20].

Replication cohorts

We included 5,260 subjects from the second data release from the LifeLines cohort study (2006–2011, LifeLines II) and 1,529 subjects from the last survey (1989/1990) from the Vlagtwedde-Vlaardingen cohort [21, 22], a prospective general population based cohort including Caucasians of Dutch descent, to replicate our initial findings.

Ethics, consent and permissions

Participants provided written informed consent. The study was approved by the Medical Ethics Committee of the University Medical Center Groningen, Groningen, The Netherlands (ref. METc 2007/152).

Genotyping and quality control

Genome-wide genotyping was performed in the identification and replication cohorts using IlluminaCytoSNP-12 arrays. The IlluminaCytoSNP-12 is an oligonucleotide chip designed to have a uniform spacing of markers across all chromosomes, with the majority of the markers on this chip reflecting common SNPs: 93% of the 301,232 markers on this chip reflect bi-allelic SNP markers. The applied genotyping quality control criteria in the LifeLines cohort and the Vlagtwedde-Vlaardingen cohort have been described before [19, 20]: Samples with call-rates of less than 95% were excluded as were samples of non-Caucasians and first degree relatives. SNPs were excluded if they had a genotype call-rate < 95%, minor allele frequency (MAF) < 1%, or a Hardy-Weinberg equilibrium (HWE) p-value < 10−4. In the LifeLines cohort 227,981 SNPs were included and in the Vlagtwedde-Vlaardingen cohort 242,926 SNPs were included.

Respiratory symptoms

Cough, dyspnea, and phlegm were defined by standardized questionnaires from the European Community Respiratory Health Survey (ECRHS) [23]. Cough was defined as at least one positive answer to the questions: “do you usually cough first thing in the morning in the winter?” or “do you usually cough during the day, or at night, in winter?”. Dyspnea was defined as a positive answer to the question: “are you troubled by shortness of breath when hurrying on level ground or walking up a slight hill or stairs at normal pace?”. Phlegm was defined as at least one positive answer to the questions: “do you usually bring up any phlegm from your chest first thing in the morning in winter?” or “do you usually bring up any phlegm from your chest during the day, or at night, in winter?”.

Statistical analysis

The data are presented as median (min-max) for continuous variables and as frequencies (percentages) for categorical variables. The GWA analyses on the presence of the respiratory symptoms cough, dyspnea, and phlegm were performed using PLINK version 1.07 [24]. We used an additive genetic model adjusted for age, sex, and current smoking. SNPs with a p-value < 10−4 in the identification analysis were taken forward for replication. Replication analysis was performed by analyzing the two replication cohorts separately using logistic regression model in PLINK version 1.07 [24] and subsequently meta-analyzing effect estimates from both cohorts. Significant replication was defined as a fixed effect meta-analysis p-value < 0.05 and an effect estimate in the same direction as in the identification GWA study. SNP annotation was performed using HaploReg version 4 (Broad Institute).

Results

Demographic characteristics and the prevalence of respiratory symptoms in the study cohorts are summarized in Table 1. In the identification cohort LifeLines I, the median age of subjects was 47 years old, 43% were male, and 24% were current smokers. The replication cohorts were comparable with the identification cohort with respect to demographic characteristics. The prevalence of respiratory symptoms in the LifeLines cohorts and Vlagtwedde-Vlaardingen cohort varied from 10 to 22%.

Table 1 Characteristics of the subjects included in the identification (LifeLines I) and replication (LifeLines II and Vlagtwedde-Vlaardingen) cohorts

The Manhattan plots of the GWAS of cough, dyspnea and phlegm are shown in Additional file 1: Figures S1, S2 and S3 respectively. A total of 17 SNPs, 19 SNPs and 14 SNPs were identified for cough (Table 2), dyspnea (Table 3) and phlegm (Table 4) in the identification analyses in LifeLines I, respectively, and taken forward for replication in LifeLines II and Vlagtwedde-Vlaardingen. Rs16918212 (OR = 0.72, p = 5.41 × 10−5 in identification; OR = 0.83, p = 0.033 in replication), located on A2MP1, was significantly associated with cough in the replication cohorts with the same direction of effect as in the identification cohort (Table 2). The replication analyses on dyspnea and phlegm showed no significant replication (Table 3 and Table 4).

Table 2 Top SNPs (n = 17) associated with cough in the GWA study (all P < 1.0 × 10−4)
Table 3 Top SNPs (n = 19) associated with dyspnea in the GWA study (all P < 1.0 × 10−4)
Table 4 Top SNPs (n = 14) associated with phlegm in the GWA study (all P < 1.0 × 10−4)

In addition, we performed GWA analyses on chronic cough and phlegm (both defined as cough or phlegm for at least 3 months per year) and found no significant replication in these analyses either (Additional file 1: Tables S1 and S2).

Discussion

To the best of our knowledge, this is the first GWA study assessing genetic variants associated with cough, dyspnea, and phlegm. In the identification cohort, we identified 17, 19 and 14 SNPs associated with cough, dyspnea and phlegm respectively at a p < 10−4 significance level. In the meta-analysis of two independent replication cohorts, one association was observed between cough and rs16918212 located on chromosome 12 in intron of A2MP1, and no associations with dyspnea and phlegm were replicated.

The odds ratio for this SNP indicates that carriers of the A allele have a lower risk to cough than subjects with the wild type genotype. This SNP is located in an intron of A2MP1 (alpha-2-macroglobulin pseudogene 1). A2MP1 has been associated with Alzheimer’s disease [25]. Pseudogenes are genomic DNA sequences similar to normal genes but non-functional; they have lost their gene expression in the cell or their ability to code protein [26]. Some pseudogenes can be functional when they are transcribed. Increasing evidence suggests that pseudogenes may have important physiological functions [26].

A major strength of this study the fact this is the first GWA study trying to identify genetic susceptibility loci for cough, dyspnea and phlegm, which included 2 verification samples: one using the same methodology (LifeLines II) and one using similar methodology (Vlagtwedde-Vlaardingen) as the discovery sample (LifeLines I). The respiratory symptoms that we studied were defined based on the standardized questionnaire of the ECRHS.

A GWA study has the advantage of being hypothesis-free. This means that it has the potential of finding new genes underlying disease phenotypes [27]. However, GWA studies also have some disadvantages such as the need of a large study sample, the need for replication, the inability to address causation, and the inability to investigate rare genetic variants [27].

A limitation of our study might be the fact that we used a liberal p-value threshold (p < 10−4) for identification of SNPs in the identification cohort to keep the risk of not detecting a true association between genetic markers and respiratory symptoms low. However, when we assessed these associations in the replication cohorts, the total number of significant associations in the replication meta-analysis is less than expected by chance (i.e. 1 out of the 50 SNPs analyzed for replication (i.e. 2%) had a p-value < 0.05 and the same direction of effect as in the identification analysis). In addition, given that rs16918212 and the A2MP1 gene have not been associated with lung function impairment or respiratory diseases we think the association is likely not a true finding. We therefore conclude that there was no convincing association between genetic markers and respiratory symptoms in this study.

The lack of finding a plausible significant association between SNPs and respiratory symptoms can possibly be explained by the fact that a respiratory symptom can be caused by different environmental exposures or can be a presentation of different underlying diseases with specific genetic or environmental origins. For example, cough, can be triggered by smoking, air pollution and occupational exposures. Susceptibility to these various exposures may be genetically determined and susceptibility loci may differ between exposures. In addition, cough is a common symptom of several chronic respiratory conditions such as asthma, chronic obstructive pulmonary disease (COPD), and lung cancer [28], but cough is also present in non-respiratory conditions such as heart failure [29]. Dyspnea is a common symptom not only in patients with lung and heart diseases, but it is also fairly prevalent among elderly individuals without apparent pre-existing disease [5].

Conclusion

We did not find a convincing association between genetic markers and the presence of respiratory symptoms cough, dyspnea and phlegm. This lack of association between genetic variants and respiratory symptoms may possibly be due to the fact that we did not take the effect of environmental exposures that give rise to respiratory symptoms into account. Therefore, the next logical step will be performing a genome-wide interaction (GWI) study to identify genetic loci for respiratory symptoms in interaction with known harmful environmental exposures.