Introduction

Diseases which cause a decline in lung function remain a huge burden to human society and the economy. One such disease, Chronic Obstructive Pulmonary Disease (COPD) is a heterogeneous and debilitating condition characterised by the development of irreversible airflow obstruction. The development of COPD has a strong environmental basis, with cigarette smoking and exposure to poor air quality being key risk factors. Unlike some common chronic diseases, the incidence of COPD has not declined in recent years, in fact there continues to be increasing prevalence, morbidity and mortality rates for COPD globally. According to the World Health Organisation, 64 million people worldwide have COPD and > 3 million people die each year of the disease [1]. Within the UK alone it is estimated that 3 million people have COPD and it accounts for 30,000 deaths each year [2]. Critically, COPD is now the third leading cause of death worldwide, Fig. 1 [3].

Fig. 1
figure 1

Global death ranks for the top 25 causes of death in 1990 and 2010. In 2010, COPD rose to the third leading cause of death worldwide. UI = uncertainty interval, COPD = chronic obstructive pulmonary disease. Reproduced from Lozano et al. [3].

In general, COPD is a progressive condition, leading to airway remodelling, inflammation and narrowing of the small airways and/ or alveolar destruction (emphysema), with symptoms generally becoming evident later in life [4]. Although the introduction of smoking bans may help to lower the incidence of COPD in some countries, not all patients with COPD are smokers [5]. It is also important to note that COPD can be caused by biomass exposure. However, in addition to environmental exposures, around 40 % of variability in lung function is estimated to be heritable [69]. There are a range of therapeutic agents available for treatment of COPD, including short and long acting β2 agonists, anti-muscarinic agents, inhaled and oral steroids and phosphodiesterase inhibitors: however, whilst these drugs can improve symptoms in some patients none of them have been show to alter the progression of underlying disease.

Review

Diagnosing COPD using spirometry

Spirometry is used to assess lung function in humans. The most useful measures are FEV1 (forced expiratory volume in 1 s) and FVC (forced vital capacity, i.e., the volume of air expired by a full expiration). When the ratio of FEV1 to FVC is under 0.7, this is referred to as an obstructive defect. The severity of COPD can also be assessed by spirometry, a value of FEV1 less than 80 % predicted indicating (in the presence of a reduced FEV1/FVC ratio) the presence of COPD. Interestingly, it has been shown that spirometry measurements also cluster within families again suggesting there is a hereditary component which may influence the development of respiratory disease [10, 11]. Between 20-60 % of phenotypic variance in lung function measures is suggested to be attributed by hereditary factors [69] and this is strongly correlated in twin studies [12].

Environmental and genetic factors of COPD

Smokers are characteristically prone to developing COPD; therefore smoking is a primary risk factor for developing COPD. Estimates indicate that after 25 years of smoking 30-40 % of smokers will have COPD [13]. Even non-smokers may be affected due to general exposure to air pollutants. One investigation into long term smoke particulate matter exposure revealed a significant association between an increase in exposure to small particles and a mild decrease in FEV1 across 20 years [14]. In addition, biomass emissions are also a notable risk factor globally, in general consisting of smoke inhalation via indoor pollution or occupational exposure. Genetic predisposition is also a known risk factor which increases an individual’s susceptibility to developing COPD. The most commonly studied example in COPD is α1-antitrypsin deficiency where individuals (commonly of northern European ancestry) are homozygous for a deleterious mutation in SERPINA1 [15]. 1-2 % of COPD cases are attributable to this mutation, which leads to enhanced neutrophil elastase activity, ultimately leading to destruction of the alveoli. Early genetic linkage analyses have indicated the existence of gene-by-smoking interactions as contributing to a decline in lung function. In those studies the logarithm of odds (LOD) score of genetic linkage was improved by restricting the analysis to smokers which suggested the existence of interaction between cigarette smoke exposure and genetic susceptibility [16]. More recently Liao et al. have more robustly explored the effects of gene-by-environment interaction by using individual SNPs and genetic network approaches [17]. Both ways of analysis identified SNPs near gene SLC38A8 as significantly modifying the effects of occupational exposure on FEV1. Genetic network analysis alone identified genes CTLA-4, HDAC, and PPAR-alpha as modulating these effects. This study implied the existence of genes related to inflammatory processes which could modify the effects of occupational exposure on lung function. Readers are advised to refer to an excellent review by Molfino and Coyle which reviews the gene-environment interaction in COPD [18].

Meta-analyses of GWA studies identifies genetic regions associated with FEV1

Large scale genetic studies (genome wide association studies (GWAS)) are now able to accurately reveal associations between phenotypes (such as spirometry measures) and genetic loci. By meta-analysing many GWA studies, researchers have revealed a number of single nucleotide polymorphisms (SNPs) within/near genes which are associated with the lung function measure FEV1 (Table 1). These genes may potentially influence the development or severity of COPD and could also be important in other obstructive diseases of the lung [19, 20].

Table 1 FEV1 associated SNPs identified using GWAS meta-analyses

Five meta-analyses and one look up of candidate SNPs identified from the SpiroMeta general population were included in the overview of GWAS meta-analyses in Table 1. In 2010, back to back publications by our group [19] and others [20] showed the utility of meta-analysing GWA studies when both studies identified SNPs within the 4q24 locus to be the most significantly associated with FEV1. Hancock et al. identified 46 SNPs at this locus with the smallest p value for SNP rs17331332 located nearest NPNT, whilst the top SNP of our study is located in oppositely transcribed genes INTS12 and GSTCD [20, 21]. Interestingly, a look up of previously suggested candidate genes found no significant associations suggesting that genome wide approaches are the most reliable way to identify true genetic risk factors for COPD and/or lung function phenotypes [22]. In the same year Soler-Artigas et al. reported 16 novel loci associated with lung function; 5 associated with FEV1, 4 of which survived joint meta-analysis of all stages (MECOM (also known as EVI1), ZKSCAN3, CDC123, C10orf11) [23]. Subsequently in 2012, Hancock et al. identified KCNJ2/SOX9 at 17q24.3 to be associated with FEV1 [24]. Given that cigarette smoking adversely affects pulmonary function, the group conducted genome-wide joint meta-analyses of SNPs and SNP by smoking associations. GWAS have also been utilised to identify variants associated with smoking behaviour. In 2010, three pivotal publications identified loci associated with smoking behaviour. Whilst Thorgeirsson et al. identified variants in neuronal acetylcholine receptors, CHRNB3-CHRNA6 and the Cytochrome P450, CYP2A6 associated with smoking behaviour [25], Liu et al. refined the association identified at 15q25 [26]. In the same year the Tobacco and Genetics Consortium identified multiple loci associated with smoking behaviour [27].

More recently in 2014, Tang et al. studied longitudinal changes in lung function and mean rates of decline by smoking pattern. The strongest association with decline in FEV1 mapped to SNPs at 15q25.1 encompassing IL16/STARD5/TMC3, however, this result did not reach genome-wide significance [28]. Furthermore, Tang et al. studied rate of FEV1 change in a subsequent meta-analyses of 5 cohorts which had more than 3 measurements per participant. Interestingly, a SNP within BAZ2B was identified at both stages [28].

COPD associated genes

In addition to the study of the genetic basis for lung function in large populations, sixteen case control studies of COPD have also been studied to try and identify SNPs in genes which are associated with COPD (Table 2). In GWA studies of COPD cohorts, SNP rs7671167, within FAM13A, was associated with chronic bronchitis, airway obstruction, emphysema and COPD susceptibility [2932]. Additionally 9 other SNPs within FAM13A were associated with COPD [29, 3134]. This region is close to but distinct from the 4q24 locus identified by earlier studies on FEV1 and FEV1/FVC ratio. HTR4 (encoding a serotonin receptor) was also found to be associated with COPD in two separate GWA studies [35, 36]. The 4q24 locus and HTR4 are discussed in more detail in a later section.

Table 2 COPD associated SNPs identified using GWAS or candidate gene methodology

In particular, 6 studies have found numerous SNP’s at the 15q25.1 locus to be associated with COPD [30, 31, 34, 35, 37, 38]. This locus encompasses 3 cholinergic nicotinic receptors (CHRNA5, CHRNA3 and CHRNAB4). However, this locus appears to exert its effects by determining an individual’s risk for nicotine dependence rather than through any direct effect on the lung per se.

Current efforts within the respiratory research community are trying to decipher the biological relevance of the functions of these genes and elucidate whether pathways identified are therapeutically targetable. On comparison of the genes shown in Tables 1 and 2 presenting the top genes associated with either FEV1 (from meta-analysing GWA studies) or COPD, the most obvious priorities for further research would appear to include TNS1, genes within the 4q24 locus (FLJ20184, INTS12, GSTCD and NPNT), HHIP, HTR4 and SOX9. Within the genes listed in Table 2 it is of particular note that there are a number of SNPs in genes implicated in the control of lung development which also show evidence of association with COPD risk, namely HHIP (SHH pathway), FGF7 (Fibroblast Growth Factor pathway) and SOX9 (Wnt/β-catenin pathway).

Genetics of lung development

Gene expression across lung development is a complex and intricately timed process. Several signalling pathways in particular are considered key for correct lung development (Table 3). Lung development is also sub-divided into five distinct developmental stages (Fig. 2), each governed by specific signalling cascades (Table 4).

Table 3 Key signalling pathways involved in mammalian lung development
Fig. 2
figure 2

Five stages of lung development. Stages of lung development in humans: Diagrammatic timeline of the developmental organisation of the mammalian respiratory system. At the embryonic stage, the major airways are formed. During the canalicular stage epithelial differentiation occurs and the air-blood barrier is formed. In the saccular stage of lung development, air spaces expand and finally at the alveolar stage, secondary septation occurs. Reprinted from [102] with permission from Elsevier, copyright (2007).

Table 4 Stages and events during lung development in humans and mice

The mammalian respiratory system originates from the anterior foregut endoderm in the foetus for the purpose of developing an ideal structure to facilitate gas exchange. During embryogenesis and the following pseudoglandular stage the two lung buds begin a complex process of branching morphogenesis; a highly regulated process generating a tree-like structure of epithelial tubes branching by dichotomy [39]. Branching is driven by a number of signalling pathways communicating between the mesenchyme and the epithelium, directing the growth and patterning of lung buds (Table 3). Branching morphogenesis is a critical time during lung development determining lung resistance and compliance in adult life. As discussed above, these determinants of airway function can be quantified by FEV1 and FEV1/FVC measures, and therefore polymorphic variation in genes active during the period of airway branching could feasibly be linked to adult lung function [40].

Of the highly complex signalling systems; Sonic Hedgehog (SHH) and Fibroblast growth factor (FGF) are considered two of the primary signalling pathways critical for lung development [39]. The critical role of separation of the trachea from oesophagus is influenced by SHH signalling and FGF patterning, with both pathways initially involved in determining distal airway development [41, 42]. Furthermore, the transcription factor Nkx2.1 marks the future oesophagus and Wnt signalling works alongside to specify lung fate [43]. In relation to lung diseases, despite regeneration and repair of injured lung tissue not currently being fully understood, it can be hypothesised that events would follow the same or similar pathways as those used during lung development outlined here. Therefore, it is important to understand any potential associations between genes involved in both COPD and lung development. For instance, of the genes associated with COPD in Table 2, SOX9, HHIP, MMP12, HTR4 and FGF7 also have distinct roles during lung development. SOX9, HHIP, FGF7 are involved in airway branching morphogenesis typically with expression levels peaking during the embryonic and pseudoglandular stages [4453]. SOX9 expression can be modulated by a number of key pathways including: HH, Wnt/β-catenin, Notch, TGF-β, NFКB, BMP and FGF [54]. Additional genes of interest due to varying expression patterns across and associations with lung development include: EFCAB4A, CHID1, ANO3, AKAP1, TGFβ2, GSTCD and NPNT [20, 21, 5560]. These genes have been demonstrated to be significantly associated with COPD (Table 2) and show preliminary evidence for involvement with lung development, with a common feature of varied expression during branching morphogenesis stages. SNPs in the genes AGER, HHIP and TNS1 are associated with reduced airway calibre and may be involved in lung development and growth [47]. In summary therefore, it appears that many of the genes which potentially underlie the associations seen in GWA studies of lung function and/or COPD are involved in control of lung development and potentially remodelling. Some additional support for a role in lung development comes from the observation that associations with lung function are present across the age spectrum, although the number of studies in younger age groups and children to date has been small. Furthermore, the mechanism of action of potential susceptibility genes can vary, where genetic susceptibility could lead to dysregulated lung development (as discussed) during childhood or adolescence or may lead to enhanced decline of FEV1 in adulthood, which has long been considered the most common indicator of COPD [6163]. A recent study has indicated that approximately half of the individuals who meet the criteria for COPD in later life (COPD at grade 2 or higher according to the Global Initiative for Chronic Obstructive Lung Disease (GOLD) grading system) [4] do attain normal maximal FEV1 in young adulthood and have accelerated rates of decline [64]. However, the authors also suggest that a substantial proportion of patients with COPD may not have had a rapid decline in FEV1, as the second half of participants followed a more typical decline in FEV1 starting from a low initial value of FEV1. Hence this may indicate populations of COPD patients with different rates of decline in FEV1 potentially a consequence of dysregulated lung development or an earlier rapid decline in FEV1 [64]. Additionally, it can be reasoned that the most important determinant of maximally attained lung function later in life is lung function measurement at a younger age, as shown by several studies involving children and young adults [6568] which may indicate initial dysregulated lung development.

Also, in 2004, de Marco et al. have proposed that the origin of COPD could be from an earlier age group than is usually believed, as a considerable percentage of subjects aged 20–44 years reported already suffering from COPD and GOLD stage 0 chronic symptoms [69] and later the group identified a subgroup of young adult subjects with a high risk of developing COPD, independently of smoking habits [70].

Epigenetics considerations in lung function and COPD

Epigenetics is commonly defined as heritable changes to gene expression, independent of changes to DNA sequence. Whereas genetic changes in DNA sequence involve variation of nucleotides, epigenetic changes alter methylation patterns at CpG sites or modifications to chromatin, influencing the level of DNA folding and therefore the levels of transcription at a particular gene. This area of research investigates the link between lifetime exposures of parents with the influence these may have on epigenetic patterns in children. Despite epigenetics consisting of dynamic and modifiable processes which can change over time, it is of key interest as these changes can persist across generations [71].

Typically, COPD is classed as a disease of later life, although as discussed above predisposition to COPD may also have an early origin during lung development. In particular, smoking during pregnancy has been investigated to understand the effects of smoking exposure on lung development, as it is suggested that susceptibility to environmental factors is highest during this period and changes may contribute to adult airflow limitation [72]. Furthermore, maternal smoking has been demonstrated to be associated with lower adult lung volume independent to post-natal exposure and of personal smoking [7276]. Of the wide range of components in tobacco smoke nicotine is thought to be the key component which alters lung development, principally because it is easily transferred to the foetus in utero in circulating blood [7781]. Importantly, approximately 12-22 % of women smoke during pregnancy [8287]. Data from animal studies and observations in humans show that smoking during pregnancy is associated with lower lung function in offspring and increases in airway smooth muscle, decreasing alveolar surface area and collagen deposition [78, 79, 88, 89]. Effects influencing lung function such as these can be attributable to epigenetic changes which may lead to a predisposition to developing COPD. For instance, exposure to nicotine in utero has been demonstrated to increase DNA methylation and acetylation in the foetus, which would be predicted to produce down-regulation and up-regulation of transcriptional activity, respectively, in the relevant target genes [77].

However, few studies have been performed identifying alterations at specific epigenetic markers in response to maternal smoking and COPD. Nevertheless, an interesting direction may be in the form of altered methylation patterns in repetitive elements across the genome. In 2009, Breton et al. demonstrated that pre-natal smoking has been associated with methylation patterns in repetitive elements, such as AluYb8, also in conjunction with null genotypes in genes involved in tobacco smoke metabolism (GSTM1 and GSTP1) [90]. This study suggests differential methylation changes may potentially be dependent upon the genotype of a foetus, hence determining the level of susceptibility to smoke induced epigenetic alterations [90]. The group also showed that smoking during pregnancy was associated with global hypomethylation, suggested to lead to chromosomal instability [90].

With the growing interest in nicotine replacement therapy (NRT) as a seemingly healthier alternative to smoking, the evidence outlined here is a reminder that use of NRT may not be a safe alternative to smoking during pregnancy [91, 92], as NRT would still be predicted to exert epigenetic effects which could alter lung development. Furthermore, maternal smoking has been found to synergise with personal smoking to increase airflow limitation and risk for development of COPD [75].

Characterisation of INTS12, GSTCD and HTR4: examples of genes with potential roles in lung development

We have recently provided evidence indicating the possible role of genetic variation near or at the integrator complex subunit 12 (INTS12, 4q24), as influencing lung function measures [21]. We reported that there is a significant positive correlation between INTS12 expression in lung tissue and percent predicted FEV1. The same was true for the nearby Glutathione S-transferase, C-terminal domain containing gene, GSTCD, and we hypothesised that these genes share the same promoter region due to the fact that they are co-ordinately transcribed. The two genes are also co-expressed in cells of the lung and whole lung tissue. Interrogation of the publically available ENCODE dataset revealed that the presumed shared promoter contains CpG islands as well as transcription factor binding sites. Most importantly, SNPs that are genome-wide significant for lung function are in cis-eQTL with INTS12 expression in various tissue types and this was not observed for GSTCD nor for any gene in strong linkage disequilibrium (LD) with INTS12. By immunohistochemistry of fixed human sections, we have previously shown that GSTCD protein expression was ubiquitous, whereas INTS12 expression was predominantly in epithelial cells and pneumocytes. During human fetal lung development, GSTCD protein expression was observed to be highest at the earlier pseudoglandular stage (10–12 weeks) compared with the later canalicular stage (17–19 weeks), whereas INTS12 expression levels did not alter throughout these stages. Although this work demonstrates potential roles of INTS12 and GSTCD as drivers of the association signal for lung function, much more work is required to ultimately bridge the gap between the 4q24 GWA study findings and how these influence lung function. A separate gene our research group has actively studied is the lung function and COPD associated serotonin receptor, HTR4. We identified that the protein level of HTR4 increased throughout lung development; however HTR4 was expressed only at very low mRNA and protein levels in adult lung [50], again suggesting a potential role in lung development.

Models to study candidate lung function/COPD genes: new approaches

As we have noted, although GWA studies have been successful at detecting genomic loci harbouring variants predicting variation in lung function measures and risk of COPD, these genetic associations are usually limited to identifying fairly broad genomic regions and are incapable of distinguishing causal variants from non-causal variants [93]. Therefore despite the unprecedented success of GWA studies, the therapeutic and functional translation of these studies is still in its infancy. There are a number of experimental approaches and models that may be used to functionally translate genetic findings. These methods can help in dissecting the genetic association signals for the currently considered respiratory phenotypes and identify underlying alleles and biological pathways that are important in lung function and COPD. Computational methods can be used to combine experimentally generated regulatory information of the human genome, such as ChIP-seq (chromatin immuno-precipitation sequencing) generated binding sites or gene expression Quantitative Trait Loci (eQTL), with respiratory loci [93, 94]. The classical scheme of following up GWA study associations concentrates on manipulation of single genes (for example generating transgenic mice which have the gene deleted or overexpressed) but this method is inevitably slow. However, given genetic association data suggests the presence of a multitude of gene variants on different chromosomes predicting the disease risk or lung function measure outcome [7, 19, 36, 95, 96]. Recently, the development of the CRISPR-Cas9 activation system, which allows simultaneous enhancement of endogenous expression of multiple genes, may speed up functional follow up of key genetic variants [97]. Additionally, enhancing endogenous gene expression from a natural promoter is more likely to recapitulate the splicing complexity than the traditional transient or stable recombinant DNA transfection approach [97]. RNA interference (RNAi) gene silencing has successfully been used to knock down genes of interest and following downstream analyses, novel gene functions have been identified. However, with RNAi-based approaches, the data requires in depth complex analysis. Ideally, more than one siRNA or shRNA could be utilised due to the degree of false positive observations, which may obscure true results with off-target silencing effects [98]. This limitation can also be addressed with the advent of CRISPR and TALEN gene editing technology which allows generation of specific gene knock-out cells with the potential for several individual gene knock-outs in combination [99]. Of note, decreases in the cost of next generation high-throughput sequencing has addressed a number of limitations faced by microarray-based approaches and allows effective discovery of biological pathways underpinning respiratory phenotypes, for example by RNA-sequencing and CHIP sequencing approaches [100]. This information could be used to make informed decisions about relevant cellular assays post genetic manipulation. Investigating respiratory phenotypes in lung tissue from specific gene knockout mice is also a valuable in vivo approach that can effectively complement in vitro work [101].

Conclusions

In conclusion, recent advances in large GWA studies and meta-analysis of results obtained across different studies has led to the identification of a large number of loci which predict lung function variability. An increasing number of these loci have also been demonstrated to show association with COPD risk per se. However, despite these advances, only a small proportion of the variability in lung function can be explained by the genetic variants described to date. This suggests many other variants are yet to be uncovered which may also contribute to the genetic basis of airflow obstruction. It is notable that many of the genetic regions which have been identified to date harbour genes which play an important role in lung development. Whether or not this means these genes are less likely to be useful targets for therapeutic manipulation remains to be defined. However, there is no doubt that understanding the role of these genes in the regulation of lung function will be key to improving our knowledge of the pathophysiology of COPD and other diseases characterised by airflow obstruction.

The observation that genes associated with lung function and COPD and also showing evidence of differential expression during lung development makes them good candidates playing critical roles in embryological lung development. However, more studies are warranted to demonstrate that through carefully controlled experiments SNP mutagenesis in those genes or whole gene knockout models display effects on lung morphogenesis or activity. If shown to be the case it would give more credence to the ‘Dutch hypothesis’ stating that COPD and asthma are essentially different manifestation of the same disease process. This is because originally this hypothesis was based on the observation that there is a fluent development from bronchitis in youth to a more asthmatic picture in adults which then further develops into bronchitis among more elderly patients. Therefore existence of genetic variants predisposing to pathobiology of lung development may be expected under this scenario.