Chronic obstructive pulmonary disease (COPD) is a heterogeneous lung disease in which recurrent bacterial infections are a major etiological factor [1,2,3,4]. The human microbiome in the respiratory tract differs between healthy subjects and COPD patients [5,6,7], shifts in composition during COPD exacerbations [8,9,10,11,12] and varies among exacerbation subtypes [9], all suggesting a close association between the lung microbiome and COPD pathophysiology with potential involvement of host immunity and inflammatory responses. It is thought that disruption of microbiome, known as dysbiosis, could trigger a dysregulated host immune response that results in infection susceptibility, inflammation and negative effects on host biology [13].

A systematic understanding of airway host-microbiome interaction in relation to COPD pathogenesis could provide the mechanistic basis for modulation of host-microbe interactions as a potential novel therapeutic strategy for COPD. A previous study on COPD patients showed that the lung microbiome was significantly associated with sputum pro-inflammatory markers especially interleukin-8 (IL8/CXCL-8, 9). In particular, there is a significant correlation between sputum interleukin-8 (IL-8/CXCL-8) with both alpha and beta diversity of the airway microbiome in COPD. In the correlation network, sputum IL-8/CXCL-8 showed the highest degree of microbiota connectivity with a significant negative correlation to 15 bacterial operational taxonomic units (OTUs), suggesting sputum IL-8/CXCL-8 could be an indicator of microbiome community structure and diversity.

Few studies have simultaneously characterized both lung microbiome and human multi-omics profiles in COPD, and in other respiratory diseases in general. Sze et al. measured the lung microbiome and host transcriptome in COPD and found Firmicutes and Proteobacteria were associated with different host gene expression profiles [14]. Molyneaux et al. profiled both lung microbiome and peripheral whole-blood transcriptome for idiopathic pulmonary fibrosis patients and identified two gene modules involved in host defense that are strongly associated with the microbiome profile [15] However, a comprehensive understanding of the collective host response at both transcriptional and protein expression levels to the lung microbiome community is lacking. A systems biology approach integrating lung microbiome and host multi-omics datasets is necessary to better understand host-microbiome interactions in COPD.

Here we performed a 16S ribosomal RNA (rRNA) gene-based survey on sputum microbiome from 16 healthy subjects and 43 COPD patients. Host sputum cell counts, transcriptome and proteome were also characterized for COPD patients. To our knowledge, this is the first study that characterizes both lung microbiome and host transcriptome and proteome profiles in stable COPD and during exacerbations. We found significant interplay between lung microbiome composition and host response in COPD that is potentially important to current treatments and future therapeutic strategies.


Patient selection

The presented study was conducted in accordance with the Declaration of Helsinki [16] and Good Clinical Practice [17]. The human biological samples were sourced ethically and in accord with the terms of the informed consents under the University of Manchester and University Hospital of South Manchester IRB/EC approved protocol (Approval number: 10/H1003/108).

Healthy subjects and COPD patients were enrolled at the Medicines Evaluation Unit (Manchester University Foundation NHS Trust Hospital). Patients with asthma, or significant respiratory disease other than COPD, or the inability to produce sputum after sputum induction were excluded from the study. Patients were seen at stable at least 6 weeks after the use of any short term antibiotics. Patients contacted the research team if they experienced a change in symptoms consistent with an acute exacerbation. Daily diary cards were used. Patients were assessed by a clinician and exacerbations defined as in increase in respiratory symptoms for two consecutive days. Smoking status, historical exacerbation frequency, GOLD status, inhaled corticosteroid (ICS) administration, Quality of Life (QoL) scores and lung function measurements (FEV1, FVC and FEV1/FVC ratio) were recorded for COPD patients (Table 1, Additional file 1: Table S1). Smoking status and lung function measurements were recorded for healthy subjects.

Table 1 Major demographic and baseline clinical features of all subjects in this study

Sputum collection

Sputum samples were collected at a single time-point from 16 healthy subjects and longitudinally from 43 COPD patients. Sputum sampling were performed prior to any systemic therapy including treatment with oral corticosteroids and/or antibiotics. Sputum samples were obtained by spontaneous expectoration or induced. For COPD patients, spontaneous expectoration was attempted first, if no sputum or too little sputum was produced, induction was then performed. For healthy subjects, only induction method was performed. Sputum samples from COPD patients were collected at stable (defined as no evidence of symptom-defined exacerbations in the preceding 4 weeks and the subsequent 2 weeks post-clinic visit), exacerbations (defined according to Anthonisen criteria [18] and/or healthcare utilization [19]), two and 6 week post-exacerbations and 6 months from first stable visit. All exacerbation sputum samples were collected prior to the institution of any exacerbation treatment. The missing samples are mostly due to patients unable to produce sufficient amount of sputum for downstream experiments (Additional file 1: Figure S1).

Sputum processing

Sputum samples were processed to obtain cell pellets and supernatant, for immune cell counting, host transcriptome and proteome analysis, according to a previous method [20]. Briefly, sputum plugs were selected from saliva and put on ice (minimum weight 0.1 g). Eight times volume of phosphate-buffered saline (PBS) was added to the sputum. The mixture was incubated in a roller mixer for 15 min on ice, vortexed every 5 min and centrifuged at 790 g for 10 min. The supernatant was split into aliquots and stored at − 80 °C for sputum proteome analysis. For cell pellets, a four-fold volume of 0.2% DTT was added and the mixture was incubated for 15 min in a roller mixer on ice, vortexed every 5 min, filtered using 48 μm nylon-mesh filter and centrifuged. Cell pellets were resuspended in 1 ml of PBS to perform haemocytometer cell counts, cytospin differential cell counts and stored at − 80 °C for transcriptomic assays.

Microbiome 16S rRNA gene sequencing

For quality control purposes, bacterial DNA extractions, sequencing and data analyses were performed in a single, centralised lab at the GlaxoSmithKline (GSK) R&D facility in Collegeville, PA, USA. The detailed procedure of bacterial genomic DNA isolation, 16S library preparation, sequencing, reagent controls, and sequence data processing was provided in the supplementary material of our previous study [11]. Bacterial genomic DNA was extracted from healthy and COPD sputum samples using Qiagen DNA Mini kit. The variable 4 (V4) region of the 16S rRNA gene was PCR-amplified with the appropriate reagent controls [9, 11], and was sequenced using Illumina Miseq. The demultiplexed and quality-filtered sequencing reads were subject to open-reference operational taxonomic unit picking (97% identity cutoff) using QIIME 1.9 [21].

Seven OTUs were detected with > 10 sequencing reads in the negative reagent controls (Additional file 1: Table S2). Although negative reagent controls were performed for all DNA isolation, extraction and PCR amplification step, we performed further analyses to ensure that potential contamination risks were minimized. We compared our results against the 92 contaminant genera detected in sequenced negative ‘blank’ controls by Salter et al. [22]. We failed to detect 56 out of the 92 contaminant genera in our dataset (Additional file 1: Table S3). Of the remaining genera that were found in our data, none had an average relative abundance greater than 0.0004, or had a relative abundance greater than 0.1 in a particular sample, except for Streptococcus which contains known lung pathogens.

Bacterial qPCR assays

All qPCR assays were performed using 384-well microbial DNA qPCR arrays (Qiagen, Germantown, MD) on a QuantStudio 12 K Flex Real-Time PCR System (Life Technologies, Carlsbad, California, USA). The 10 μl reaction mixture contained 5 μl of Microbial qPCR master mix with ROX and 5 μl of Microbial-free water (Qiagen, Germantown, MD). Each well was spotted with a mix of two PCR primers (10 μM each) and one 5′-hydrolysis probe (5 μM) with 10 ng of added sample DNA. The following cycling parameters were used: initial cycle of 95 °C for 10 min; 40 cycles of 95 °C for 15 s; and 60 °C for 2 min. All qPCR templates were run in duplicate and tested for amplification inhibition by use of a positive PCR Control (Qiagen, Germantown, Maryland, USA). For standard curve calculation, each plate run included a decimal serial dilution of double-stranded DNA oligos (Integrated DNA Technologies, Skokie, Illinois, USA) designed from the 16S rRNA gene of pan-bacteria, Haemophilus influenzae, Moraxella catarrhalis, Streptococcus pneumoniae, Prevotella melaninogenica and Veillonella dispar. The cycle threshold values and DNA copy numbers were calculated using the QuantStudio 12 K Flex software (Life Technologies, Carlsbad, California, USA).

Host RNA microarray analysis

Host transcriptome was profiled for 38 COPD sputum samples (Additional file 1: Figure S1). Total RNA was extracted using Trizol reagent (Invitrogen) from sputum cell pellets and further purified with a RNeasy mini kit (Qiagen, Valencia, California, USA) according to the manufacturer’s instructions. RNA quality was evaluated on the Agilent 2100 Bioanalyzer and quantitated by OD260. For samples passing RNA QC criteria (RIN > 5.5, A260/280 value 1.6–2.4, total RNA > 50 ng, presence of distinct 28S and 18S ribosomal RNA peaks), 50 ng RNA was used for NuGEN amplification and labeling of probes using the NuGEN Ovation RNA Amplification System (NuGEN Technologies). The amplified sscDNA was purified using the Agencourt RNAClean magnetic bead clean-up system. The sscDNA samples were quantified by spectrophotometry and profiled on Agilent 2100 Bioanalyser prior to array hybridization. The array hybridization was performed using Affymetrix GeneChip HG-U133 Plus 2.0 microarray (Affymetrix, Santa Clara, California, USA), which contains 54,675 probe-sets interrogating 50,155 human transcripts. The raw microarray data (CEL files) were corrected for background signal, quantile normalized and summarized using robust multiarray average (RMA) normalization to generate probe-set-level microarray data using Array Studio v10.0 (OmicSoft, Cary, North Carolina, USA). The probe-set-level microarray data were log2 transformed and converted to gene-level (24,442 genes) by selecting the probe with greatest inter-quantile range for its corresponding gene as suggested previously [23]. The microbiome and microarray data are deposited at the National Centre for Biotechnology Information Sequence Read Archive (SRP136124) and Gene Expression Omnibus databases (GSE112165), respectively.

Proteomic assays

Host proteome was characterized for 37 sputum samples using the SOMAscan® platform (Somalogic, Additional file 1: Figure S1). The SOMAscan® assay has been described in detail previously [24,25,26]. The assay quantitatively transforms the proteins present in a biological sample into a specific SOMAmer-based DNA signal. Briefly, each SOMAmer® reagent binds a target protein (in total 1310 proteins) and is quantified on a custom Agilent microarray hybridization chip. Normalization and calibration were performed according to SOMAscan® Data Standardization and File Specification Technical Note (SSM-020). The output of the SOMAscan® assay was reported in relative fluorescent units and was log2 transformed for downstream analysis.

Statistical analysis

Differentially represented bacterial taxa were identified using edgeR [27]. Differentially expressed genes in microarray were identified using limma in R bioconductor [28] and were enriched for pathways using MetaCore v5.0 (Thomson Reuters). Multivariate modeling was performed to associate microbiome, host transcriptome and proteome data. The power calculation was performed following the procedure of Morgan et al. [29]. Specifically, correlated variable pairs were simulated with standard normal distribution and a sample size of 40, the number of samples that have both transcriptome/proteome and microbiome data. The 80th percentile of raw P-values of the Spearman correlation test was calculated as a function of true covariance of the variables. The number of allowable tests for 80% power and 5% type I error rate was estimated by Bonferroni correction, which is 0.05 divided by the 80th percentile of raw P-values calculated as above.

To reduce dimensionality, a Principal Component Analysis (PCA) was performed on the gene-level microarray data. PCA was also performed on the Somalogic proteome profile of 1310 proteins. The transcriptome Principal Components (tPCs) and proteomic Principal Components (pPCs) with proportion of variance > 2% were selected for association testing. For the microbiome datasets, 9 bacterial genera with average relative abundance > 1% were selected for association testing. A variance-stabilizing arcsin square root transformation was applied to the microbiome proportion data. All continuous variables were scaled to unit variance. For each genus and Shannon diversity, a general linear mixed model (GLMM) was established associating the variable with tPCs and pPCs adjusting for timepoints and patient demographic factors including smoking status, GOLD status and exacerbation frequency using lme4 in R [30]. Subject ID was included as a random variable to adjust for multiple measures per subject. The model was optimized in terms of Akaike information criterion (AIC) through backward elimination of non-significant effects in a stepwise algorithm using the “step” function in the R lmerTest package [31]. The same GLMM was applied for associating each pPCs with all tPCs. For association within stable samples, as no repeated measures were involved, a general linear model (GLM) was established using glm in R [32]. The model was optimized in terms of AIC through backward elimination of non-significant effects in a stepwise algorithm using the “step” function in the R stats package [32].

To assess functional enrichment of each tPC, all 24,442 genes were ranked by their loadings in that tPC and a Gene Set Enrichment Analysis (GSEA) [33] was performed on the ranked gene list using concatenated MetaCore (GeneGo), KEGG, Reactome, BioCarta and Pathway Interaction Database (PID) pathways (a total of 2809 pathways) using GSEAP reranked 6.0.10 in GenePattern [34]. The enrichment scoring scheme was set to ‘classic’ as suggested in the program instructions. One thousand permutations were performed for each run. Gene sets larger than 500 genes or smaller than 15 genes were excluded from the analysis.

The false discovery rate (FDR) method was used to adjust P-values for multiple testing wherever applicable [35].


Sputum microbiome between healthy and COPD and during exacerbations

Sputum microbiome was characterized for 101 sputum samples (Fig. 1a) from COPD patients (Fig. 1b) and healthy controls (Fig. 1c). A total of 16,386,538 reads were generated after demultiplexing and quality filtering. After rarefaction to 89,462 reads per sample, 2807 OTUs were identified among all samples. Similar to other lung microbiome studies [5, 6, 8,9,10, 36,37,38,39], the majority of OTUs belonged to Firmicutes (53.6%), Bacteroidetes (21.9%) and Proteobacteria (19.5%) at the phylum level, and Veillonella (37.7%), Prevotella (15.3%), Haemophilus (14.0%), Streptococcus (8.6%) and Moraxella (2.9%) at the genus level. Quantitative PCR showed significant correlations between the absolute quantities of all five species and their relative abundances in the microbiome data (Spearman’s rho ≥0.43, FDR P = 2.57E-3).

Fig. 1
figure 1

Overview of the sputum microbiome taxa distributions. a Overall study clustering for all 101 samples. b Clustering of 32 COPD stable samples. c Clustering of 16 healthy samples. Each column represents one sample colored by different subgroups. Y-axis represents relative abundances of major phyla and genera. Samples were clustered by UPGMA clustering based on the weighted UniFrac distances. HNS: healthy non-smokers, HS: healthy smokers, CS: COPD current smokers, ES: COPD ex-smokers, non-ICS: non-ICS exposer, ICS: ICS exposer, IE: infrequent exacerbators, FE: frequent exacerbators

Significantly increased relative abundance of Haemophilus was observed in healthy smokers versus non-smokers (log2FC = 3.36, FDR P = 0.041), and in COPD ex-smokers versus current smokers (log2FC = 2.49, FDR P = 0.025, Fig. 2a). Comparison of the microbiome profiles between healthy subjects and stable COPD patients showed a significantly increased relative abundance of the genera Moraxella, Streptococcus and Actinobacteria (log2Fold Change (log2FC) ≥ 1.32, FDR P = 0.026, Additional file 1: Table S4) and decreased alpha diversity (Shannon, P = 0.036) in stable COPD patients (Fig. 2b). A significantly increased Moraxella was observed at stable state in GOLD III versus II patients and in inhaled corticosteroids (ICS) versus non-ICS exposed patients (Additional file 1: Figure S2).

Fig. 2
figure 2

Sputum microbiome profiles in healthy subjects and COPD patients. a Shannon diversity and relative abundance of major bacterial taxa in healthy controls and stable COPD patients, and in healthy and COPD subgroups in relation to smoking status. b Shannon diversity and relative abundances of major bacterial taxa in COPD patients at different visits. The number of samples in each group is indicated in the parenthesis. Significantly differentially represented bacterial taxa were identified using edgeR [27]. For visit, statistical analysis was performed on each adjacent two time points. E0: COPD exacerbations, E2: 2 week post-exacerbations, E6: 6 week post-exacerbations, 6 Months: 6 months from first stable visit, HNS: healthy non-smokers, HS: healthy smokers, CS: COPD current smokers, ES: COPD ex-smokers. *** FDR P < 0.001, ** FDR P < 0.01, * FDR P < 0.05

During COPD exacerbations, increased Moraxella (log2FC = 3.14, FDR P = 0.019) and decreased alpha diversity was observed compared to stable state (unpaired analysis, Fig. 2b, paired analysis see Additional file 1: Figure S3), along with significantly increased neutrophil and decreased macrophage percentage (FC ≥ 1.2, P ≤ 0.05, Additional file 1: Figures S4–S5). A non-significant increase of total bacterial load was observed during exacerbations (Additional file 1: Figure S6). Conversely, the trend of increased Moraxella and decreased alpha diversity was reversed at post-exacerbation time points (Fig. 2b).

Sputum neutrophil counts were most significantly associated with microbiome compositions, with positive correlations with Haemophilus and Neisseria, and negative correlations with Streptococcus, Megasphaera and Veillonella across all samples (Spearman’s rho = 0.33, FDR P ≤ 0.05, Fig. 3, Additional file 1: Table S5). The significant correlation between Haemophilus and sputum neutrophil count was further confirmed by qPCR (Spearman’s rho = 0.37, P = 0.037, Additional file 1: Table S6). No bacterial taxa or sputum cell counts were associated with QoL scores, FEV1 or FVC.

Fig. 3
figure 3

Significant spearman correlations (with 95% confidence intervals calculated by univariate regression model) between major sputum microbiome compositions with sputum leukocyte percentages

Host transcriptome and proteome at COPD stable state and exacerbations

We compared host transcriptome differences between COPD stable and exacerbations. A substantial amount of 2453 upregulated and 4814 downregulated differentially expressed genes (DEGs) were identified at exacerbations versus stable state (FC ≥ 1.5, FDR P ≤ 0.05), in which 239 and 8 MetaCore pathways were significantly enriched respectively (FDR P ≤ 0.01, Additional file 2). A large proportion of the upregulated pathways were involved in immune response with the top pathways being interferon and interleukin-6 signaling pathways. The downregulated pathways included cell cycle, nucleotide metabolism and phagocytosis pathways. No DEGs were found between stable patient subgroups according to clinical characteristics (GOLD stage, smoking status, ICS administration and exacerbation frequency).

For patient proteome data, 790 of the 1310 proteins had significantly higher expression levels in stable COPD ex-smokers compared to current smokers, including multiple pro-inflammatory markers such as interleukin-36, fibrinogen and matrix metallopeptidase 10 (FC ≥ 1.5, FDR P ≤ 0.05, Mann-Whitney-Wilcoxon test, Additional file 2). No differentially expressed proteins were identified for other comparisons.

Haemophilus and Moraxella are most significantly associated with host transcriptome and proteome

To gain insights into airway host-microbiome interactions in COPD, we established a multivariate linear model between microbiome, host transcriptome and proteome profiles across all samples (including exacerbations) and within stable samples only. We first performed a power estimation and calculated that given a true covariance of 0.5 between bacterial taxa and gene expression in 40 samples (the number of samples with both transcriptome/proteome and microbiome data), it would be possible to perform a maximum of 102 pairwise tests (or approximately 10 microbiome and 10 host expression factors) and retain 80% power and an alpha of 0.05 using Bonferroni correction (Additional file 1: Figure S7). As it is impossible for significant associations to survive correction for multiple testing of ~ 20,000 human genes, we performed an unsupervised dimensionality reduction on host multi-omics data using PCA. A total of 9 transcriptome and 8 proteome PCs (tPCs and pPCs, respectively) with proportion of variance > 2% were selected, together explaining 72 and 84% of observed variance. Using all samples (including exacerbations), a GLMM was established between each of the 9 major bacterial genera and all tPCs or pPCs, adjusting for different timepoints and patient demographic factors. Among all genera, Haemophilus and Moraxella were most strongly associated with host factors, in particular strong positive correlations with tPC2 and tPC4, respectively (FDR P < 5.0E-4, Fig. 4a, Table 2, Additional file 3).

Fig. 4
figure 4

Multivariate modeling showed strong association of Haemophilus and Moraxella with host transcriptome and proteome profiles. a A host-microbiome interaction network illustrating significant associations among the 9 most abundant bacterial genera and Shannon diversity, tPCs and pPCs in GLMM. Each edge indicates a significant association (FDR P ≤ 0.05) colored by direction. The edge weight corresponds to the significance of the P-value. The size of the node is proportional to the number of significant associations involving the node. b GSEA enrichment scores of the top pathways on the loadings of each tPC. For each tPC, the top 10 positively and negatively enriched pathways (FDR P ≤ 0.01) were included in the heatmap. Pathways were clustered using complete clustering and colored by their clustering groups. The functional categories of the pathways are overall in agreement with their clustering groups. c Top loadings of each pPC. For each pPC, the top 6 proteins by magnitude of loadings were included in the heatmap

Table 2 Associations of the 9 bacterial genera and Shannon diversity with tPCs both across all samples and within stable samples in generalized linear mixed models. FDR P-values are indicated in the table. Significant associations are highlighted in asterisks. Only significant variables were included in the final model unless otherwise stated

Examining the top loadings of tPC2 and tPC4 revealed that they reflected increased expression of some immune response genes, such as interleukin-1 receptor-associated kinase 1 (IRAK1), interleukin-18 binding protein (IL18BP), linker for activation of T-cells family member 1 (LAT) for tPC2, and several interferon genes for tPC4 (Additional file 1: Figure S8, Additional file 4), indicating that high level of these two tPCs might correspond to increased immune activities. We performed GSEA on the loadings of each tPC to further understand its functional properties. Both tPC2 and tPC4 were most significantly positively enriched for host immune response pathways. Several T-cell differentiation (i.e. Th1 and Th2 cell differentiation) and pro-inflammatory cytokine (i.e. IL-12) signaling pathways were among the top positive pathways for tPC2, while interferon signaling pathways were the top positive pathways for tPC4 (Fig. 4b, Additional file 4). Individual genes in the top pathways of tPC2 and tPC4 showed consistent correlations with Haemophilus and Moraxella respectively in both microbiome and qPCR datasets (Additional file 5), further supporting the associations in the multivariate models. Furthermore, both Haemophilus and tPC2 exhibited positive correlations with pPC3 (FDR P = 3.1E-3, Table 3, Additional file 3), together forming an interconnected subnetwork (Fig. 4a). Likewise, both Moraxella and tPC4 were positively correlated with pPC1 (FDR P = 0.014, Additional file 3). Multiple pro-inflammatory markers such as interleukin-1 receptor, matrix metalloproteinase 7, galectin-2 and TNF-related weak inducer of apoptosis were among the top loadings for pPC1 or pPC3 (Fig. 4c). In addition, several known bronchial epithelial cell receptors that respond to bacterial lipopolysaccharide (LPS) such as angiopoietin-1 receptor [40] and Ephrin-A2 [41] were among the top loadings for the two pPCs (Additional file 4). The correlations of Haemophilus-tPC2-pPC3 and Moraxella-tPC4-pPC1 were further confirmed by qPCR (P ≤ 0.1, Additional file 1: Table S6). In addition, both tPC4 and pPC1 were increased at exacerbations versus stable (Additional file 1: Figure S9). Both tPC2 and pPC3 were significantly positively correlated with sputum neutrophil counts (FDR P ≤ 0.05, Additional file 1: Table S5).

Table 3 Associations of the 9 bacterial genera and Shannon diversity with pPCs both across all samples and within stable samples in generalized linear mixed models. FDR P-values are indicated in the table. Significant associations are highlighted in asterisks. Only significant variables were included in the final model unless otherwise stated

Among other major genera, Megasphaera was strongly positively correlated with tPC1 and pPC7 (FDR P = 2.0E-4), which were negatively enriched for host immune pathways such as IL-17 and interferon pathways (Fig. 4b) and associated with reduced expression of pro-inflammatory markers such as C-C motif chemokine 20 and interleukin-36 (Fig. 4c, Additional file 4). Therefore, increased abundance of Megasphaera could be associated with reduced airway inflammatory responses. In comparison, other major genera such as Streptococcus and Veillonella were associated with relatively little host response.

Within stable samples only, the Haemophilus-tPC2-pPC3 associations persisted, while Moraxella was not associated with any host PCs (Additional file 1: Figure S8, Tables 2-3). Within stable, Streptococcus showed a significantly negative correlation with tPC6 (FDR P = 3E-3, Table 2, Additional file 3), in which several phagocytosis and neutrophil migration pathways were most negatively enriched. Thus, increased Streptococcus could be associated with greater expression of these pathways at stable. Furthermore, an unknown genus in S24–7 family had significant positive correlations with tPC9 at stable (FDR P = 2E-3, Table 2, Additional file 3), in which IL-12, IL-23 and T-cell differentiation pathways were most negatively enriched. This genus, despite its low abundance, could be associated with reduced inflammatory response at stable.


Here we present the first comprehensive study characterizing airway host-microbiome interactions in COPD integrating lung microbiome and host multi-omics datasets both in stable state and during exacerbations. The systems biology approach revealed a significant airway host-microbiome interplay associated with COPD inflammation and exacerbations. Among all major genera, Haemophilus and Moraxella were most strongly associated with host gene expression profiles, particularly immunity and inflammation, suggesting the two genera as key players in airway host-bacterial crosstalk in COPD.

Importantly, our results revealed different timing of host responses to these two genera. While Haemophilus was associated with host responses both in stable state and during exacerbations, the associations for Moraxella were primarily related to exacerbations. This is consistent with a previous study [42] and highlights the role of Haemophilus as a stable airway colonizer and Moraxella as an exacerbation-related opportunistic pathogen in COPD. Furthermore, the Haemophilus-associated immune responses were correlated with the degree of neutrophilic inflammation, underscoring the interactions between bacterial presence, host immune responses and cellular inflammation. This suggests that chronic airway inflammation in some COPD patients may not respond to anti-inflammatory therapies alone [43] unless the underlying bacterial infection driving the abnormal immune response is addressed.

To achieve statistical power for a genome-wide analysis associating microbiome and host multi-omics datasets in a relatively small sample set, we performed dimensionality reduction on host data and used multivariate modeling to identify significant associations between microbiome composition and overall patterns of host gene expression. Similar approaches were employed by Morgan et al. in associating gut microbiome with host transcriptome in inflammatory bowel disease patients [29]. The strong correlations of Haemophilus and Moraxella with host immune and inflammation-related tPCs and pPCs highlight the positive links between the two genera and host immune responses that predominated airway host-microbiome interactions. The presence of lipopolysaccharide-induced bronchial epithelial receptors among the top loadings demonstrates that our approach can recapitulate an active host-bacterial crosstalk in COPD. While both Haemophilus and Moraxella were positively associated with T-cell induced pro-inflammatory signaling, the interferon signaling was more strongly linked to Moraxella than Haemophilus. This is consistent with one previous study showing that M. catarrhalis but not H. influenzae induced interferon-beta expression in bronchial epithelial cells [44] and aligns with the different pathogenicity profiles between the two pathogens [45]. Differential involvement of viral co-infection could be another important factor [46]. We have not fully characterized the sputum viral load of this cohort due to limited sputum available. Additional viral load data is key to further resolving this question.

Our multivariate analysis showed that Megasphaera and an unknown genus in S24–7 family were associated with reduced expression of host inflammatory pathways and therefore could potentially reverse airway inflammation (i.e. interferon, IL-12 pathway) induced by Haemophilus and Moraxella. Furthermore, Megasphaera was negatively correlated with sputum neutrophil counts. Megasphaera is a known member of human lung microbiome [39] and has beneficial effects on the host through short chain fatty acids (SCFAs) production [47]. In the lung microenvironment, bacterial SCFAs were shown to inhibit cytokine production and inflammation after LPS stimulation of macrophages [48]. Trompette et al. also showed that bacterial SCFAs reduce neutrophil recruitment to the airways and protect against influenza virus infection in mice, suggesting that it has anti-inflammatory effects [49]. Cait et al. demonstrated that diet-derived SCFAs ameliorate allergic inflammation in mice, suggesting its anti-inflammatory effects in the lung [50]. One study on oropharyngeal microbiome of H7N9-infected patients showed that Megasphaera increased in patients without secondary bacterial infection, suggesting its potential role in preventing colonization of respiratory pathogens [51]. Further validation on the identity and prevalence of these genera is warranted to explore their functions in the COPD lung.

Our study provides novel insights on the impact of smoking on the lung microbiome, although individual subgroups had small sample sizes and the results need further confirmation in larger cohorts. Our results suggest that the effect of current smoking on the lung microbiome differs between healthy subjects and COPD patients. In healthy subjects, a significantly increased Haemophilus was observed in smokers versus non-smokers, suggesting that smoking could be a risk factor for airway dysbiosis in healthy populations. In COPD patients, a significantly increased Haemophilus was observed in ex-smokers versus current smokers. The greater dysbiosis in COPD ex-smokers was further associated with their greater airway inflammatory states, as evident by significantly higher expression of sputum pro-inflammatory markers. Our findings further support the view that smoking likely had resulted in an irreversible airway inflammation in COPD, which persisted despite smoking cessation [52].

We observed significant increase of Moraxella in stable COPD patients versus healthy subjects, and in COPD exacerbations versus stable, in agreement with previous observations [5, 8, 9, 36]. The reversal trends of microbiome diversity and composition prior and post exacerbations further support the lung microbiome dysbiosis during exacerbations. Increased Haemophilus and Moraxella were found in stable ICS versus non-ICS exposed patients, consistent with earlier observations [6, 9], with the caveat being the small sample size of non-ICS users. At stable, the microbiome was comparable between frequent and infrequent exacerbators, suggesting that the baseline microbiome does not effectively predict exacerbation frequency. Identifying markers that predict the exacerbation frequency is of great importance for COPD management [53]. Differences in baseline respiratory microbiota composition were hypothesized to explain the different exacerbation frequency in COPD patients [13]. However, neither this study nor earlier reports support this hypothesis [54, 11]. Instead, previous longitudinal studies showed that there is an association between temporal variability of the airway microbiome and patient exacerbation frequency [11, 12], suggesting that the frequent exacerbator phenotype might be more relevant to the de-stabilization of the microbiome over time but not the microbiome composition at baseline per se. We observed no significant association between microbiome or sputum cell count changes with CAT score, FEV1 or FVC, which suggests that different patient inflammatory profiles (i.e. neutrophilic or eosinophilic inflammation) and their associated airway microbiome changes are likely independent of disease severity and cannot be distinguished clinically [55].

There are several caveats to our study. First, the sample size was relatively small particularly for the subgroup analysis, and the longitudinal profiling was limited due to the limited amount of sputum produced in some visits and the technical difficulty in extracting sufficient material from sputum for the various aspects of downstream experiments (i.e. microbiome, transcriptome, proteome, cell counting). We performed a power estimation to ensure adequate statistical sensitivity could be achieved after dimensionality reduction. Nevertheless, the associations observed in our study need to be validated in larger independent patient cohorts. Second, host transcriptome and proteome were not profiled for healthy subjects, which is important to understand to what extent the observed host-microbiome associations are disease specific. Our study provides a method for profiling airway host-microbiome interactions that should catalyze future efforts on characterizing lung microbiome and host multi-omics in larger healthy and disease populations.


To our knowledge, this is the first study that depicts airway host-microbiome interactions in COPD and highlights the differential role of Haemophilus and Moraxella in terms of host interactions. Our study provides support for novel therapies targeting both genera and their associated host pathways to overcome the abnormal immune response in COPD.