Background

Chronic obstructive pulmonary disease (COPD) is a disease with both environmental and genetic risk factors [1, 2]. Several genome-wide association studies (GWAS) have demonstrated multiple autosomal associations [3]. However, much of the overall heritability of COPD remains unexplained [4]. One potential source of additional information is genetic variation on the X chromosome, which has been excluded in most prior COPD GWAS [5].

Inclusion of the X chromosome in association studies requires modified quality control procedures and poses statistical challenges distinct from autosomes, including the presence of only one X chromosome in males and random inactivation of one X in females [5, 6]. Standard GWAS methodology for autosomal variants relies on the fact that each locus has 2 alleles and thus each paired autosomal variant generally has three possible genotypes, 0/1/2. If these methods are applied to the X chromosome, where males are hemizygous, the X variants would be coded as 0/1 for males and 0/1/2 for females. This may limit the ability to detect associations compared to methods that consider male X chromosome genetic variants as 0/2 [7]. Underutilization of genetic information on the X chromosome has been noted along with calls for greater incorporation of allosomal data in future GWAS of complex human traits and diseases [8].

The motivation of this study is to properly include and assess the X chromosome variation on a genome-wide level to more completely understand sex differences in COPD. Women are more susceptible to severe, early-onset COPD [9]. Disease features such as emphysema and exacerbations show sex differences, as do disease manifestations of dyspnea, depression, and anxiety [10, 11]. Several GWAS have begun incorporating the X chromosome [12,13,14]. Variants associated with lung function on the X chromosome have been identified in previous GWAS [14,15,16]. There has been limited prior interrogation of the X chromosome in GWAS of COPD, and related phenotypes including emphysema, that employed comprehensive accounting for the X chromosome and included sex-stratified analysis [16, 17]. Prior studies have shown the significance of sex differences and the X chromosome in cardiovascular disease and related traits, but similar examinations have not been performed in COPD [18].

This study aims to perform an X chromosome-wide association study (XWAS) and meta-analyses of COPD datasets, including sex-stratified analyses to test for association between genetic loci on the X chromosome with COPD and related phenotypes of lung function and quantitative emphysema (emphysema) on chest computed tomography (CT) scans. To achieve this, we utilize quality control and statistical methods that specifically consider X chromosome inheritance patterns and inactivation. We hypothesize that the X chromosome harbors variants important in determining risk of COPD and related quantitative phenotypes, and that X chromosome variants may drive some sex differences in COPD manifestations.

Some results have been previously reported as an abstract [19].

Methods

Study participants and phenotyping

Study participants were current and former smokers in three previously described studies: Genetic Epidemiology of COPD Study (COPDGene, non-Hispanic white subset), Genetics of COPD in Norway Study (GenKOLS), and Evaluation of COPD Longitudinally to Identify Predictive Surrogate Endpoints Study (ECLIPSE) [1, 20, 21]. Subjects with COPD (cases) were defined by Global Initiative for Chronic Obstructive Lung Disease (GOLD) airflow limitation severity grades 2–4, on the basis of standardized post-bronchodilator spirometry with Forced Expiratory Volume in one second (FEV1) < 80% predicted, and FEV1 to Forced Vital Capacity (FVC) ratio < 0.7 [22]. Control subjects (controls) included smokers with normal spirometry (FEV1 ≥ 80%, FEV1/FVC ≥ 0.7). Measurement of quantitative emphysema (emphysema) was defined as the natural logarithm transform of the percentage of lung voxels with density less than − 950 Hounsfield units on chest CT inspiratory images (log − 950), determined using Thirona software (http://www.thirona.eu) for COPDGene, and Slicer software (http://www.slicer.org) for ECLIPSE and GenKOLs [23].

Study subjects provided written informed consent and each study's research protocol was approved by institutional review boards at participating institutions. Study phenotypes are further defined in the supplement (Additional file 1). Datasets from COPDGene (accession number phs000179.v6.p2) and ECLIPSE (accession number phs001252.v1.p1) are publicly available in dbGaP.

Genotyping, quality control, and imputation

Genotyping for all studies was performed using various Illumina (San Diego, CA) platforms. Genotyping and initial quality control methods have been previously described, and included cleaning based on subject-level missingness, cryptic relatedness, and sex checks based on X and Y chromosomes [21, 24]. Principal components of genetic ancestry were generated separately, based on genotyped autosomal data in each case–control population using EIGENSOFT, as previously described. The pseudoautosomal region was excluded from the X chromosome prior to association analysis, defined for human genome build 19 (GRCh37/hg19, https://www.ncbi.nlm.nih.gov/grc/human) by X chromosome base pair coordinates 60,001–2,699,520 and 154,931,044–155,260,560.

Additional quality control steps for variants on the X chromosome followed published recommendations and were performed using PLINK v1.9 (Fig. 1) [5, 25]. A full description of study based subject level data cleaning as well as imputation methods are in the supplement (Additional file 1).

Fig. 1
figure 1

Cleaning and imputation by study. Steps in subject cleaning, variant cleaning, and imputation prior to X chromosome association analysis. Numbers in parentheses represent subjects or variants removed. Abbreviations: Chr chromosome, SNP single nucleotide polymorphism, MAF minor allele frequency, QC quality control

Statistical analysis

X chromosome variants were tested in each of the three studies independently for association with four individual phenotypes: COPD case–control status, FEV1 in liters, FEV1/FVC and quantitative emphysema using logistic regression for COPD status and linear regression analysis for quantitative phenotypes. XWAS were run in all subjects and additionally stratified by sex as demonstrated in Additional file 1: Figure S1a–c for COPDGene. Regression models for each phenotype were adjusted for age, pack-years of smoking, and principal components of genetic ancestry for all analyses [6, 25, 26]. For FEV1 in liters height was included as a covariate. For the analyses of all subjects, sex was considered as a covariate (females 0, males 1). For the emphysema analyses, additional covariates included current smoking, body mass index, and scanner model if more than one scanner was used in that study. Variants were excluded if they had a minimum minor allele frequency of < 0.01, were multiallelic, or in the COPD case/control XWAS if they were present in only one group.

Clayton’s method for XWAS was employed, which accounts for both female X chromosome inactivation (XCI) and male X chromosome hemizygosity [6, 27]. Males carry only one copy of the X chromosome, while in female most loci are subject to XCI so that a female will have approximately half of their cells with one copy active and the remainder will have the other copy activated. Males are equivalent to homozygous females in respect to such loci. Clayton’s method for analysis of the X chromosome, implemented in PLINK v2.0, performs logistic regression models with one degree of freedom encoding female loci as 0 (homozygous for reference allele), 1 (heterozygous), or 2 (homozygous for alternate allele) and male loci as 0 (no copies of alternate allele) or 2 (single copy of alternate allele). With this approach the heterozygous female genotype falls between two homozygous genotypes on the linear predictor scale, thus accounting for XCI as only 50% of cells will have a normal active allele [27].

Results were subsequently combined into fixed-effects meta-analyses using all quality-controlled variants from COPDGene, ECLIPSE and GenKOLs studies with PLINK v1.9 (Additional file 1: Figure S2). Meta-analyses were run for each of the four phenotypes, both for all subjects and separately in sex-stratified datasets. Variants were excluded if only present in one study. Testing for sex-difference was performed comparing effect estimates among males and females among the top suggested associations in the sex-stratified meta-analysis, with sex-difference significance assessed at P < 0.05 [28]. A description of methods for significance thresholds, the suggestive level of association examined, and annotation is in the supplement (Additional file 1).

Replication

Ten X chromosome variants previously discovered in GWAS for lung function were examined for replication in the meta-analysis results for this XWAS [14, 16, 17]. One variant, rs28382751 from Shrine et al. was not evaluated as it was not present in our study.

Results

Subjects

A total of 10,193 subjects of non-Hispanic white or European ancestry with X chromosome data were included in the analysis. Baseline characteristics and summary statistics for each dataset among all subjects and in sex-stratified subsets are shown in Table 1. The largest study population was from COPDGene, with 6631 subjects, while GenKOLs had 1658 subjects and ECLIPSE 1904 subjects. Compared to the other two studies, GenKOLs had fewer mean pack-years of smoking history, and a higher proportion of current smokers. Compared to the other two studies, ECLIPSE, which enrolled a majority of subjects with COPD and comparatively few control smokers, had fewer female subjects and more severe disease. Among males and females in each study group, female subjects were slightly younger, had a shorter pack-year smoking history, and had less severe disease, characterized by higher lung function, less emphysema (especially among females in ECLIPSE), and a lower proportion of subjects with COPD.

Table 1 Characteristics of study populations

Subject level cleaning, variant level cleaning, and imputation of genotyped data was done individually for each study (Fig. 1). The resultant variants were included in the XWAS, with 618318 genotyped and imputed variants in COPDGene, 388274 in GenKOLs, and 485006 in ECLIPSE. XWAS were performed separately for each phenotype among all subjects and then in sex-stratified analyses (Additional file 1: Figure S1a–c demonstrates the XWAS analyses for COPDGene). This included association studies for each of the four phenotypes assessed separately in each of the three study strata: all subjects, males, and females. Results were combined using fixed-effects meta-analyses, one for each of the four phenotypes (Additional file 1: Figure S2), with 223295–224268 variants included in each meta-analysis depending on phenotype and population.

XWAS

Top suggestive associations from the all-subjects and sex-stratified XWAS meta-analyses are presented in Table 2 and Additional file 2: Table S1. Figure 2 and Additional file 1: Figure S3 provide locus plot visualizations of selected top association results. Quantile–Quantile and Manhattan plots for the meta-analyses are shown in Additional file 1: Figure S4. The top association in rs5979771, a variant closest to TMSB4X, achieved genome-wide significance for association with FEV1/FVC among all subjects (Beta (\(\beta )\) 0.020, standard error (SE) 0.004, p 4.97 × 10–08). The same variant was also the top suggestive variant to show evidence of association with FEV1 among all subjects. No other variants reached genome-wide significance, but suggestive associations approaching genome-wide significance were identified. Power calculations can be found in Additional file 1.

Table 2 XWAS Meta-analysis Top Suggested Associations in COPD Related Phenotypes
Fig. 2
figure 2

Meta-analysis locus plots. Plots for rs5979771, the genome-wide association for FEV1 near TMSB4 X in the XWAS meta-analysis among all subjects and in sex-stratified populations. Abbreviations: COPD chronic obstructive pulmonary disease, XWAS X chromosome association study, FEV1 forced expiratory volume in one second, L liters, FVC forced vital capacity

Sex differences

Testing for sex difference compared effect estimates among the top suggested associations in sex-stratified meta-analysis for males and females (Table 2). Testing was run in 32 variants, which included 28 unique variants annotated to 25 unique genes. There were significant sex differences identified among 20 of the variants, which implicated 17 unique genes based on annotation to the closest gene. These variants were found across all four phenotypes, including 7 variants with larger effect in males and 13 variants with larger effect in females. There were two genes, DMD and POU3F4, both implicated by more than one variant among the top suggested associations, where the sex-effect was different dependent on the variant.

ACE2

Recent attention has come to the X chromosome gene ACE2 and its role in SARS-CoV-2 susceptibility. SARS-CoV-2 severity has been shown to vary by sex, and there is increased risk for severe disease in those with chronic respiratory conditions including COPD [29, 30]. Meta-analysis results for variants in ACE2, located at Xp22.2 (base pair region 15579156–15620192), were examined and top associations with each phenotype and sex strata are in Additional file 3: Table S2. None of the ACE2 variants approached genome-wide significance.

Replication

Examination of ten X chromosome variants previously discovered to have genome-wide significant associations with lung function in prior studies included one variant, rs142755000, that was among the top suggested associations in this meta-analysis XWAS for FEV1 (\(\beta\) − 0.134, SE 0.029, p 4.30 × 10–06) and FEV1/FVC (\(\beta\) − 0.028, SE 0.006, p 3.36 × 10–06), with the effect allele A having a frequency of 0.03 (Additional file 4: Table S3). This was comparable to results of Zhao et al. who found, among 5768 subjects in a COPD-enriched population of White race including COPDGene, that rs142755000 was a genome-wide association for FEV1 (\(\beta\) − 0.18, SE 0.03, p 3.58 × 10–08), also having effect allele of A with a frequency of 0.03, and a similar appearance to the locus zoom plot [17]. Description of annotation for rs142755000 is in the supplement (Additional file 1).

Discussion

This meta-analysis is the first comprehensive examination of markers on the X chromosome specifically testing for association with COPD and COPD-related phenotypes of pulmonary function and emphysema in three case–control populations. Using stringent data quality control and statistical methods specific to the X chromosome, we identified a genome-wide significant association with FEV1/FVC ratio for one variant near TMSB4X. There were additional suggestive associations with COPD-related phenotypes in analyses of all subjects and in sex-stratified XWAS. These findings emphasize the importance of properly including X chromosome variants in association analyses. They also support the important role sex differences play in the pathobiology of complex lung disease, including factors such as escape from XCI as seen with TMSB4X, and sex differences in effect size, as seen among 20 of the top suggested variants in stratified analysis.

Genetic risk factors for COPD have only been partially determined [2]. The X chromosome may contribute additional genetic risk that could explain some of the missing heritability for this complex disease not accounted for by autosomal GWAS results [3, 8]. COPD is a heterogeneous disease that also shows sex specific pathobiology [10]. Females have been suggested to have earlier onset of disease, and greater lung function decline per daily number of cigarettes smoked, while males may have more emphysema by quantitative imaging [9, 11]. An autosomal variant previously studied by Hardin et al. found sex-specific associations with COPD [4]. A larger GWAS by Sakornsakolpat et al. where 82 autosomal COPD variants were identified did not show sex-specific effects; this question may be better answered by examining the X chromosome [3].

The X chromosome has been included in prior lung function GWAS, although not routinely, and in that setting has been included in COPD investigations, though there has not previously been a direct XWAS of COPD and related phenotypes [14,15,16,17]. A lung function GWAS including the X chromosome by Soler Artigas et al. identified one variant at genome-wide significance that does not replicate in this study [14]. Wyss and colleagues included the X chromosome in their assessment of lung function and did not find any significant associations [15]. A large GWAS by Shrine et al. looked at a genetic risk score for lung function, including examination of the X chromosome with identification of five variants at genome-wide significance in the UK Biobank; however, these variants did not replicate in SpiroMeta and no variants met the study threshold for inclusion in their genetic risk score [16]. None of these variants replicated in our current XWAS. Most recently, Zhao and colleagues examined pulmonary function and COPD in whole genome sequencing data, identifying four variant signals on the X chromosome for FVC and one for FEV1, rs142755000, which was in a COPD-enriched stratum [17]. This FEV1 variant near HMGN5 was among the top suggested associations in our current meta-analysis XWAS for FEV1 and FEV1/FVC.

In this analysis we demonstrated the importance of proper inclusion of the X chromosome as it harbors variants with association and suggestive association with COPD and COPD-related phenotypes. A variant near TMSB4X reached the genome-wide significance threshold for association with lung function in all subjects for FEV1/FVC and showed a suggestive association with FEV1. In the emphysema XWAS, the top suggestive association in all subjects was a variant in FRMPD4, located 332 kb upstream of the TMSB4X variant in the Xp22.2 locus; both FRMPD4 and TMSB4X escape XCI [31]. TMSB4X encodes an actin sequestering protein, thymosin \(\beta\) 4, that plays a role in regulation of actin polymerization and is important in organization of the cytoskeleton, as well as being involved in cell proliferation, migration, and differentiation [32]. TMSB4X is expressed equivalently in both male and female lung tissue but has been observed to have male-biased expression in skin, adipose, and kidney [33,34,35]. A comprehensive analysis of transcriptome sequencing data in COPD lung tissue demonstrated decreased expression in COPD compared to normal tissue based on both RNA-seq and quantitative real-time PCR [36]. Thymosin \(\beta\) 4 and one of its methionine oxidation products, thymosin \(\beta\) 4 sulfoxide, has been found at increased levels in bronchioalveolar lavage fluid of smokers [37]. In smokers, methionine oxidation plays a role in \(\alpha\)(1)-antitrypsin inactivation and pathologic lung remodeling [37, 38]. Thymosin \(\beta\) 4 is thought to limit inflammation through autophagy and has been found to have a protective effect in interstitial lung diseases including scleroderma, bleomycin-induced lung damage, and reperfusion-induced acute lung injury [37, 39,40,41]. A variant near TMSB4X has been associated with risk of childhood onset asthma in the UK Biobank; though it is not found in linkage disequilibrium with the variant identified in this study the direction of effect is the same [42, 43].

There were additional suggestive associations found among the all-subject XWAS. In this lung function XWAS, variants from the Xq21.1 locus including in HMGN5 and near SH3BGRL were implicated, which are both expressed in lung tissue [33]. A variant in HMGN5, rs185387095, was a top suggested association in this study, and a nearby variant in rs142755000 in linkage disequilibrium was also suggested; rs142755000 was identified in association with FEV1 in prior whole genome sequence analysis of COPD-enriched White race populations including COPDGene [17]. HMGN5 modulates cellular transcription and, in a murine model, mutations in HMGN5 are associated with a lung function phenotype on pulmonary function tests as well as an emphysema-like phenotype [44].

The sex-stratified analysis of COPD-related phenotypes provides additional revealing information and suggests functional associations based on size and direction of effect not identified in XWAS of all subjects. There was a larger effect in females found in sex-stratified testing among top suggested variants: two for COPD, one for FEV1, five for FEV1/FVC and five for emphysema. This included different directions of effect among one variant for lung function near ITM2A and four variants for emphysema near TAB3, TBX22, GUCY2F, and FMR1-AS1. There was a larger effect in males found in sex-stratified testing among top suggested variants: one for COPD, five for lung function, and one for emphysema. This included different directions of effect among two variants for lung function, one in OPHN1 and one near ARHGAP36, as well as one variant for emphysema near SLITRK2. Further discussion of suggestive associations can be found in the Additional file 1).

One source of potential disease pathology for some of these variants is escape from XCI [45, 46]. Mammalian females carry two copies of chromosome X, and the majority of X-linked human genes are subject to XCI, where one of the two copies is silenced. However, at least 15–23% of genes escape XCI to some extent, and thus both X chromosome genes are expressed, with a variable continuum of expression likely due to epigenetic effects [31, 34, 47]. XCI is not random, with the majority of escape genes found on the short arm of chromosome X, a region enriched for male-biased and female-biased genes [34, 35, 48]. This is likely related to the evolutionary history of the sex chromosomes where the short arm of the X chromosome is a recent addition to an ancestral chromosome [48]. The impact of XCI is complex and variable across individuals and tissue types, and without functional studies it is not possible to know if the closest annotated genes are causal [34, 35]. TMSB4X, the closest gene to the top variant implicated in this analysis, is known to escape XCI, though there is not a significant sex difference in the current analysis. Evidence of escape from XCI has been reported for other genes implicated by suggested associations in or near DMD, HTR2C, and GUCY2F, which demonstrate a significant sex difference in our analysis, as well as SH3KBP1 and FRMPD4, which do not have significant sex differences in the current analysis [31, 32, 34, 35, 49, 50]. Future investigation of gene expression and methylation patterns together with genetic variation may reveal pathologic relevance for these genes that escape XCI despite nominal evidence for sex-specific genetic association.

Escape from XCI results in sex biases in gene expression and has implications for the role of the X chromosome in human diseases, including intellectual disability, autoimmune disease, and cancer [34, 35, 45, 46, 48, 51]. It has been suggested that escape from XCI is a mechanism for respiratory disease pathology in COPD and that escape from XCI influences lung tissue transcription [52]. POU3F4 was the only suggested variant in the current analysis that is a transcription factor. In our published sex-specific network analysis of gene expression data in normal lung tissue, POU3F4 demonstrated sexed biased targeting of 15 X chromosome genes, 14 of which escape XCI, including two important XCI regulators, XIST and JPX (Additional file 1: Figure S5) [53, 54]. This points to sex-biased genetic variants as one mechanism implicated in sex-biased transcriptional targeting.

We implemented X chromosome-specific genetic association testing for our analyses, using parameters conforming to Clayton’s method for analysis to account for XCI in females and male X chromosome hemizygosity [6, 27]. Hickey and colleagues compared Clayton’s method to autosomal methods and alternate X chromosome methods and found this approach provided the best power to detect an association [7, 27]. This method has been used in several GWAS that identified significant association between X chromosome markers and complex disease traits [14, 55, 56].

Prior COPD GWAS have excluded genomic information from the X chromosome in their analyses [3, 16, 24]. Although individual reasons for excluding these data are not known, many studies exclude X chromosome genetic variants because of the statistical difficulties inherent to testing for association with the sex chromosomes. We implemented X-specific approaches to both quality control and statistical analysis to improve both accuracy and power to detect an association. Larger studies of X chromosome markers will be required to reliably break the threshold of genome-wide significance on the X for COPD and related phenotypes, but we feel that our results represent an important step in determining associations involved in smoking-related obstructive lung disease.

Our study has several limitations. Since the study cohorts are limited to smokers, we were not able to assess risk variants or effect modification among non-smokers and our results may not generalize to non-smokers. Clayton’s method provides a statistical approach to random XCI, but some areas of the X undergo non-random inactivation [27]. Future studies including allele-specific methylation will be helpful to directly investigate the effect of XCI more thoroughly. Variants with minor allele frequency of 1% or greater were included in this analysis, though the allele frequency of some of the suggestive associations was near 1%, which increases the likelihood that these suggestive associations are driven by a small number of subjects; low frequency variants should be interpreted with caution. We were unable to replicate all the previously discovered genome-wide significant associations with lung function, which may be related to differences in phenotypes examined (five variants were for FVC only, which was not examined in the current study), study design, power, and patient populations including the inclusion of current and former smokers only [14, 16, 17]. The top hits described in this study were not replicated in another population, which would be an interesting future direction. Despite these limitations, we did identify several intriguing biologically plausible genes that could play a potential role in COPD; additional investigation of allosomal genetic variation and lung disease is imperative.

Conclusions

In summary, these results represent a comprehensive analysis of markers on the X chromosome for association with COPD and related phenotypes including emphysema in three COPD case–control cohorts. We identified one genome-wide significant variant and several promising associations between markers on the X chromosome that may contribute to sex differences in COPD. Among 33 top suggested variants there were 20 unique variants that had significant sex differences in stratified analysis, seven with larger effect in males and thirteen with larger effect in females. Among the 25 top genes implicated with suggestive associations in this study, at least six have evidence for escape from XCI, including the top genome-wide association near TMSB4X. XCI may be an important contributor to disease pathology, though the impact of XCI is complex and variable. Detecting variants associated with complex traits is inherently difficult and requires large sample sizes for males and females to address the statistical complexities of studying the X chromosome. Genetic association studies in human lung disease should routinely consider systematic interrogation of X chromosome variants as these may reveal new genes for sex-specific diagnostic and therapeutic approaches to COPD.