Background

Chronic Obstructive Pulmonary Disease (COPD) is a major cause of morbidity and mortality worldwide. The World Health Organisation's Global Burden of Disease and Risk Factors project estimates that in 2001, COPD was the fifth leading cause of death in high-income countries and the sixth leading cause of death in countries of low and middle income [1]. Cigarette smoking is the main aetiological risk factor in disease development. However, only about 10 - 20% of smokers develop clinically significant COPD though this may be an under-estimate [2]. Furthermore, there is variability between smokers who have similar levels of smoke exposure, with many developing milder forms of the disease [3]. These observations suggest that underlying genetic factors contribute to either disease susceptibility or protection.

A widely accepted hypothesis of COPD causation is an imbalance of proteolytic enzymes and their inhibitors based on the observation that severe alpha1-antitrypsin deficiency predisposes cigarette smokers to the development of pulmonary emphysema in early adult life. This is thought to be due to uninhibited neutrophil elastase (NE) activity degrading elastin, a major component of lung connective tissue [4]. There are a number of other proteolytic enzymes also capable of degrading elastin, most particularly the matrix metalloproteinases (MMPs), a family of potent proteinases that degrade all the major protein components of the extra-cellular matrix (ECM).

MMPs play a key role in tissue remodelling and repair and there is significant evidence that members of the MMP family may also play an important role in COPD pathology. Transgenic mice over-expressing MMP-1 develop emphysema [5], whilst MMP-12 knockout mice are protected from emphysema despite prolonged cigarette smoke exposure[6], implicating MMP-12 as a compelling emphysema determinant in this model. In COPD patients a range of studies implicate MMPs in the pathogenesis of the disease [712]. A number of studies have reported associations of genetic variants in MMPs-1, 9 and 12 with COPD or related phenotypes. MMP-1 and MMP-12 are located in close proximity on chromosome 11 and MMP-9 is located on chromosome 20. Reported associations of these genes with COPD include associations of haplotypes of MMPs- 1 and 12 with rate of decline in lung function [13] in Caucasians and association of an MMP-9 promoter polymorphism with emphysema in a Japanese population [14]. However, there are conflicting data on the potential involvement of MMP variation in COPD. This likely reflects a number of issues relating to false positives in case- control studies or a lack of power due to relatively small sample size and the limited number of single nucleotide polymorphisms (SNPs) investigated, MMP 1, 9 and 12 studies are summarised in table 1. Other issues that contribute to lack of reproducibility include variation in the definition of cases and controls, improper matching of cases and controls and ethnic differences [15]. The results therefore need to be interpreted with caution.

Table 1 Previous association studies performed in COPD

To address the major issues, a European cohort of 977 Caucasian COPD patients and 876 non-diseased smoking controls with complete genotyping information was used to evaluate the relationship of 26 SNPs within MMPs- 1, 9 and 12 with COPD and to investigate associations with GOLD -defined severity phenotypes. These tagging SNPs provide coverage of most of the variation within the MMP genes.

Methods

Subjects

COPD and non-diseased smoking control subjects were recruited from six European centres. Only white Caucasians were recruited. Study approval was obtained from appropriate local committees to each recruitment centre. Informed consent was obtained from all subjects. The resource of COPD patients and non-diseased smoking controls has been described previously [1619] and is one of the largest reported to date. The number of subjects recruited from each centre was as follows. Barcelona (Spain): 70 controls and 138 cases; Bristol (UK): 152 controls and 129 cases; Dublin (Ireland): 195 controls and 196 cases; Edinburgh (UK): 81 controls and 168 cases; Leiden (the Netherlands): 216 controls and 188 cases; and Pisa (Italy): 198 controls and 198 cases. It should be noted that the total resource consisted of 1017 COPD patients and 912 controls. If strict criteria for genotyping were not met or assay design proved difficult, results were not included in the present analysis to minimise potential errors. This resulted in completed unequivocal genotypes in 977 COPD patients and 876 controls.

Briefly, recruitment criteria were agreed by a panel of respiratory physicians with an interest in COPD and included a firm clinical diagnosis of COPD; airflow limitation as indicated by (Forced Expiratory Volume in 1 Second) FEV1 ≤ 70% normal predicted values and FEV1/(Forced Vital Capacity) FVC < 0.7; no significant reversibility with salbutamol (≥ 200 μg) and a smoking history of ≥ 20 pack years with matched smoking history for the controls. FEV1 is measured using a spirometer, it is the maximum volume of air that can be forcibly breathed out in one second after full inspiration. FVC is the maximum volume of air that can be forcibly breathed out after full inspiration.

Patients were excluded from the study if they had an established diagnosis of asthma, lung cancer, history of atopy, known AAT deficiency or a serum AAT level of less than 1.0 g/L. Patients with acute exacerbations four weeks preceding study assessment were also excluded. This was an agreed part of the protocol to enable future studies, where appropriate, to investigate correlations of genotypes with quantitative traits such as lung function or concentrations of analytes in biological fluids. We did not recruit any patients with Global Obstructive Lung Disease (GOLD) Stage I. Disease severity was classified according to GOLD as follows: GOLD Stage II (moderate disease): FEV1/FVC ratio of < 0.70 and FEV1 predicted ≥ 50% and < 80%; GOLD Stage III (severe disease): FEV1/FVC ratio of < 0.70 and FEV1 predicted ≥ 30% and < 50%; and GOLD Stage IV (very severe disease): FEV1/FVC ratio of < 0.70 and FEV1 predicted < 30% [20].

Control subjects with no evidence of airflow obstruction (FEV1 and FVC ≥ 80% and FEV1/FVC > 0.70) were recruited at each centre with attempted matching to COPD patients for ethnicity, age, gender and smoking history. Exclusion criteria were as described for cases and additionally included family history of COPD. However, complete matching was not achieved due to a high proportion of smokers aged 65 or over having evidence of some pulmonary obstruction, thereby excluding them from recruitment as controls.

We compared the frequencies of MMP SNPs for each centre in cases and controls to look for population stratification. For the genes under study, population stratification would have been reflected in different frequencies in the centre to centre comparisons. No such differences were observed. These findings are in keeping with previous observations on this sample set [19] and with data obtained from British samples reported by the Wellcome Trust Case Control Consortium in genome- wide association studies where stratification has been shown to be not as significant an issue in the British population as previously thought [21]; similar findings are likely for western European populations.

Study Population Characteristics

The study population characteristics are summarised in Table 2. Despite attempts to match cases and controls there were significant differences observed and adjustments were made by logistic regression to take this into account in the statistical analysis. The characteristics of recruited COPD patients across the GOLD severity categories are shown in Table 3.

Table 2 Characteristics of Controls and COPD subjects
Table 3 Characteristics of COPD patients according to GOLD classification of disease severity

Power Calculations

Using Quanto software for the power calculations, with an autosomal dominant model, this study has 80% power to detect an odds ratio of 1.5 with an allele frequency of 0.1.

SNP Selection and Haplotypes

SNP identification within MMPs- 1, 9 and 12 genes was carried out using information obtained from a combination of databases including HapMap and Seattle SNPs. SNPs with a minor allele frequency of less than 5% were excluded from the study, as these were likely to be insufficiently powered to detect association. SNPs lacking any validation status as described on the dbSNP database were also excluded. Further narrowing of SNP selection was based on putative function described in the databases and predicted function by the FastSNP web-service [22]. Where SNPs were in linkage disequilibrium (r2 ≥ 0.8) in previously reported Caucasian populations (HapMap, National Heart Lung and Blood Institute's (NHLBI) Programs for Genomic Applications European (PGA EUR) population, or Environmental Gene Project Centre d'Etude du Polymorphisme Humain (EGP CEPH) European population) a single tagging SNP was chosen for genotyping, thus minimising the number of SNPs that needed to be investigated and reducing the number of statistical tests required. This resulted in 26 SNPs across the three genes being selected for analysis. Information on the 26 genotyped SNPs is indicated in Table 4. The SNPs were selected as described below.

Table 4 SNPs selected for genotyping

MMP-1

A panel of 21 validated SNPs was identified within the gene and its promoter region. However, rs488178 was found to be unsuitable for design and was excluded. Consideration of the linkage disequilibrium between the SNPs resulted in a final panel of 13 SNPs being genotyped, giving 87% coverage of validated SNP variation in MMP-1.

MMP-9

A panel of 23 validated SNPs was identified within the gene and its promoter region from the databases, which covered all known validated SNPs within the region. However, rs25650 was found to be non-polymorphic in the study cohort and was excluded from further analysis. Consideration of the linkage disequilibrium between the SNPs resulted in a final panel of 8 tagged SNPs being genotyped, giving 96% coverage of validated SNP variation in MMP-9.

MMP-12

A panel of 17 validated SNPs was identified within the gene and its promoter region, a selection which covered all known validated SNPs within the region. However, rs28360355, rs28381675 and rs28360356 were found to be unsuitable for assay design and were excluded. Consideration of the linkage disequilibrium between the SNPs resulted in a final panel of five SNPs being genotyped, giving 82% coverage of validated SNP variation in MMP-12.

Genotyping of Study Samples

Full genotyping information was available in 977 COPD cases and 876 controls. Genotyping was carried out commercially by K- Bioscience using KASPar, an in-house validated competitive allele-specific polymerase chain reaction SNP genotyping system which utilises fluorescence resonance energy transfer quencher cassette oligonucleotides. Taqman, a rapid fluorophore based real- time PCR method, was used to genotype SNPs found to be unsuitable for KASPar assay design. As a quality control measure, approximately 5% of samples were genotyped in duplicate to check for concordance. There was 100% concordance between the duplicates, satisfying criteria for the assays to be accepted for further analysis.

Statistical Analysis

Each of the SNPs in the three genes was analyzed for Hardy-Weinberg equilibrium (HWE) using PROC ALLELE in SAS/Genetics Release 9.1.3 [23], an empirical p-value of ≤ 0.01 was used as a cut-off, to reduce the likelihood of reporting false positives. To examine linkage disequilibrium, the correlation coefficient between SNP pairs within each gene in cases and controls was calculated using Haploview 4.1 [24] a software package that computes linkage disequilibrium and haplotype blocks from genotype data (figure 1).

Figure 1
figure 1

LD plot in controls of SNPs in MMPs- 1, 9 and 12. Estimated as r2 using Haploview 4.1 output. SNP codes are provided in order of location along each gene; dark grey squares depict strong LD (1.0) with strong confidence, pale grey and white regions represent low LD, and the r2 value is provided within each box. SNP positions are demonstrated 5' to 3' relative to contig postion (HuRef NCBI build 36.3).

Analyses of allele and genotype frequencies for the three MMP genes were performed using PROC ALLELE in SAS/Genetics. The adjusted p-values for genotype and allele frequencies were obtained using PROC LOGISTIC in SAS, adjusting for age, sex, smoking and centre.

As haplotypes of MMPs have been reported to be associated with COPD or decline in lung function [13, 25], using FAMHAP18 [26] we examined haplotypes in all possible SNP combinations based on the eight SNPs in the MMP-9 gene on chromosome 20 and also in all combinations consisting of four or fewer SNPs from the 16 analysable SNPs (see below) in MMP-1 and MMP-12 combined on Chromosome 11. We considered only up to four haplotypes because of the optimal power considerations in FAMHAP18. We compared haplotype distributions for all cases versus controls, for GOLD severity III and IV cases versus controls and for GOLD severity IV only versus controls. We used as a cut off criterion a p < 0.01 for a significance test on the contingency table of all possible simulated haplotypes within possible SNP combination in cases and controls.

P-values were based on 10,000 simulations and since for this sample size the 95% confidence limits for 0.01 are +/- 0.002, we report all comparisons with a p-value less than 0.012. To allow for multiple testing we used the maximum estimated q-value that corresponds to p-values ≤ 0.012 with a high stringency estimate of the false discovery rate (FDR) [27, 28] (using the q-value Package in R [29]).

Odds ratios for haplotype and individual SNP allele distributions are relative to the common haplotype or allele. The adjusted odds ratios with 95% confidence intervals, adjusting for age, sex, smoking and centre are estimated for haplotypes by weighted logistic regression as described previously [19] (PROC LOGISTIC in SAS), and for alleles by standard logistic regression.

Using PROC GLM in SAS [23] the quantitative trait associations between SNPs or haplotypes and the phenotype FEV1 were tested by multivariate regression using a similar weighted method to that used in logistic regression [19], adjusting for age, sex, smoking and centre. The adjusted means were determined using the LSMEANS option.

Results

Quality Control

The full list of SNPs analysed are shown in Table 4. The overall genotype call rate was 95% (range, 92-96%), and the accuracy was 100% according to duplicate genotyping of > 5% of samples. The use of high stringency cut-offs resulted in the loss of some genotyping data and reduction in the number of patients studied from the original sample set. Significant deviation from HWE was observed in controls for two MMP-1 SNPs, rs470358 and rs2071230. The assays for these two SNPs were sub-optimal, with overlap in the signals observed for homozygotes and hetrozygotes.

Association Analysis

Table 5 compares the genotype frequencies between all cases and controls for the 26 SNPs analysed. There were no significant differences at p < 0.01 in either the crude analysis or the analysis adjusted for age, sex, smoking levels and centre. To evaluate potential genetic contribution to more severe manifestations of the disease, those with moderate levels of disease were excluded from the cases and patients with either severe or very severe disease (GOLD groups III and IV) were compared to controls. This comparison was also performed for cases with very severe disease only (GOLD group IV). This approach enabled evaluation of the potential underlying genetic contribution to disease severity phenotype within the study population with least reduction in power. While unadjusted analysis indicated significant associations between genotype frequencies of SNP 18 and more severe disease, these became non-significant with p > 0.01 when adjusted for age, sex, smoking and centre (Table 5). Table 6 analyses allele relationships for all SNPs.

Table 5 MMPs -1, 12 and 9 - Genotype frequencies in Controls and Cases
Table 6 MMPs -1, 12 and 9 - Allelic frequencies in Controls and Cases

As described in the Methods, we then examined all possible haplotypes consisting of between two and four SNPs in the same chromosome for differences between the case groups defined above and controls. Adjusting for multiple testing the only significant differences in haplotype distributions were found when severe/very severe (GOLD groups III and IV) cases were involved. Table 7 shows the seven significant haplotypes together with their unadjusted p-values. Since the haplotype comprising SNPs 14 and 18 (p = 0.0086) is a subset of all the other significant combinations identified, we deduce that the significant results are a consequence of the association of SNPs 14 and 18 with disease severity in COPD, despite the apparently high false discovery rate of 0.54. The low LD (r2 = 0.083) in controls between these two SNPs indicates the separate involvement of both SNPs in this association.

Table 7 MMP - 1 and 12 Risk Haplotypes related to disease GOLD severity III and IV combined versus controls

Table 8 shows the actual haplotypes of SNPs 14 and 18 in the MMP-12 gene and their frequencies in severe/very severe disease (GOLD Stages III and IV) compared to controls. For completeness the table also shows the relationship of the alleles of each SNP alone with severity. Persons with the rare G allele of either SNPs 14 or 18 were protected against severe/very severe COPD. Compared with the common A-A haplotype of SNPs 14 and 18, subjects with the G allele at either locus had a significantly reduced risk of having severe/very severe COPD. Of note, and of importance from a population perspective, is that, based on the distribution in controls, 18% of the European population have at least one of these protective haplotypes, with an average risk 0.76 (95% CI: 0.61 to 0.94) times less than those with the common haplotype.

Table 8 MMP- 12 haplotypes significantly different between GOLD Classification III and IV cases and controls

Confining the analysis to cases only we examined the relationship between variation in MMP-12 SNPs 14 and 18 and their haplotypes to severity of disease using FEV1 as a severity measure. Weighted multivariate regression models, adjusting for age, sex, smoking and centre were used to test for the associations. Those individuals with the haplotype A-A had a significantly lower predicted FEV1 (42.62% versus 44.79%, p = 0.0129) when compared to patients with any haplotypes containing a G allele. We did not find a significant association between FEV1 and the alleles in SNP 14 and SNP 18 singly.

Discussion

A number of studies have reported on the role of genetic variants of MMPs in genetic susceptibility to COPD or related phenotypes. However, the majority of studies have concentrated on the three well known promoter polymorphisms of these genes - MMP1-1607-/G (rs1799750), MMP9 -1562 C/T (rs3918242) and the MMP12-82A/G (rs2276109) in comparatively smaller sample numbers (Table 1).

Our study reports on the largest and most extensive screen of SNPs within the MMPs- 1, 9 and 12 genes undertaken to date and include the three promoter SNPs or SNPs that are in LD with the promoter SNPs which have been previously reported, in a larger sample size. All SNPs evaluated in the final analysis were in HWE in controls, had high call rates and concordant duplicates by genotyping, suggesting that this study was not likely to be influenced by genotyping errors. All subjects were of European descent and any potential effects of population stratification were minimised by recruiting patients and controls from each centre and confirmed by the fact that SNP genotype frequencies in the controls for each centre were found to be similar. Given the greater level of power in our study it is unlikely that effects of the magnitude observed in previous, less powered studies, have been missed. However, there may be subtle differences in phenotypes and association with quantitative traits such as lung function decline in the previous studies that we were not able to test. This is in part due to multiple testing issues, investigating numerous phenotypes could lead to false positives. Also, phenotype data for some clinical endpoints such as CT scans for emphysema has not been collected for our population. Although, this may cause difficulties with planning functional work, future work testing associated SNP's in other COPD cohorts characterised for specific phenotypes will address this.

The patients with very severe disease had a mean age that was less than the severe and moderate groups and this may reflect a survival bias.

We found that MMP's - 1 and 9 were not associated with COPD in our populations. This is an interesting result as these genes have previously been implicated in COPD by both rodent and a range of human studies. In the previous case control association studies, many of the associations in MMP's 1 and 9 have been demonstrated in non - Caucasian populations. Further to this, some of the studies use different phenotypic end points.

For the MMP-12 gene, we identified haplotypes associated with two SNPs, SNP 14 (rs652438) and SNP 18 (rs2276109), which showed associations with severe/very severe COPD. The associations with GOLD Stage III and IV disease suggest that these SNPs may play a modifier role in disease severity. We also showed that haplotypes of the two SNPs were significantly related to severity within the COPD cases, as determined by the quantitative levels of FEV1 within the cases. Those with the haplotypes A-G, G-A or G-G had higher FEV1 levels. It is of note that, evaluation of the association of SNP 18 (rs2276109) with COPD has been previously examined, but no effect was seen [13, 30]. This probably reflects underpowered studies and the need to examine disease severity. SNP 14 (rs652438) has been previously shown to be associated with smoking induced COPD in a Chinese cohort [30], as well as being associated with a decline in lung function when considered as a haplotype with the MMP-1 promoter SNP 13 (rs1799750) [13]. The recent genome wide association study (GWAS) study in COPD by Pillai et al [31] didn't report association for MMP-12 in COPD even though there are SNPs in full linkage disequilibrium with rs2276109 and rs652438 on the illumina 550k platform used. However, this does not preclude the SNP's involvement in disease as they may have significance but not to genome wide level, access to the primary data would be of interest regarding this. Furthermore, in our population MMP-12 SNPs are associated with severe forms of COPD after haplotypic association, unlike the GWAS which is total COPD cases versus control population using single SNP association.

Although we noted a trend (p = 0.054) for association of SNP 14 (rs652438) with disease, our strict cut-off for significance indicates that it may require a much larger sample size to be confident of such an association. It is of note, however, that SNP 14 (rs652438) is a constituent of all the haplotypes found to be associated with severe/very severe disease, suggesting that it contributes to disease severity in COPD. It is also possible that this locus, or genetic variants in LD with rs652438, may affect susceptibility to COPD in the Chinese population. Whilst we did observe three haplotypes involving SNPs 13, 14 and 18 which were associated with severe/very severe disease (Table 7), there was no significant association for a 2-SNP haplotype with SNPs 13 and 14 only.

SNP 18 (rs2276109) with the alleles A/G is a known functional variant where the A allele shows a higher affinity for the transcription factor activator protein-1 (AP-1), resulting in increased expression in gene reporter assays. In the current study, the A-A haplotype composed of SNP 18 (rs2276109) and SNP 14 (rs652438) is over-represented in the cases, suggesting that increased MMP-12 levels may contribute to COPD pathogenesis. This is consistent with observations of increased MMP-12 activity in the lungs of patients with COPD and with observations in a rodent model of disease where MMP-12 knock-outs are protected against smoking-induced emphysema.

A recent paper by McAloon et al [32], screened MMPs 1, 3 and 12 relatively comprehensively in AATD with associations being found with gas transfer. This could allude to MMP - 12's role in parenchymal disease pathophysiology rather than airways disease, especially when considered with association in this study against severe forms of COPD and in previous studies with emphysema.

Conclusions

While this study provides suggestive evidence for a contribution of genetic variation in MMP-12 to disease severity in COPD, we do not find evidence for the involvement of MMPs- 1 and 9 in COPD. This merits further investigation of MMP-12 associations in similar or larger cohorts.