The composition and function of the airway microbiota have been associated with respiratory conditions, such as pneumonia [1, 2], asthma [3,4,5,6,7,8,9], and chronic obstructive pulmonary disease [4, 10,11,12]. However, the airway is vast and extends from the nasal openings to the alveoli deep within the lungs. Ideally, samples for microbial analysis would be collected as close as possible to the cellular inflammatory responses contributing to the respiratory illness in question, which occur in the lower respiratory tract for bronchiolitis.

Unfortunately, the deeper segments of the airway (the lower respiratory tract) must be assessed through more invasive methods, such as bronchoscopy, which is not feasible for large-scale studies. Furthermore, easy sample collection is especially important for studies involving infants and young children. As a result, multiple studies in this young population have been conducted on the microbiome using nasal swabs (NSs) or nasal brush specimens, which are easier to collect and less invasive than samples from deeper within the airway [3, 6, 7, 13, 14].

Questions remain about the correlation between the microbiota within different microenvironments along the airway. Although prior studies have demonstrated strong correlations between upper and lower airway microbiology [15, 16] and virology [17], other data have suggested significant spatial variation in the composition and structure of the upper and lower airway microbiota [18, 19]. Additionally, there are emerging data about the correlation between the microenvironments in the upper airway [7, 20]. Yan et al. found in 12 healthy adults that the microbiota of the anterior nares was significantly different from that of the middle meatus and sphenoethmoidal recess [20]. Pérez-Losada et al. compared the microbiota of the inferior turbinate and nasopharynx among 30 children and adolescents with asthma [7] and found that these two regions had distinct microbial compositions. However, both of these studies were single-center studies with less than 40 participants. Moreover, while previous studies have examined compositional differences between microenvironments, no previous study has assessed if there are differences in the associations between the microbiota from different regions of the upper airway and a clinical outcome, particularly among young children and infants for whom less invasive sampling methods are imperative.

As part of the 35th Multicenter Airway Research Collaboration (MARC-35) study, site teams collected both NS and nasopharyngeal aspirate (NPA) samples from children hospitalized for bronchiolitis within 24 h of hospitalization [21]. Our two objectives in this secondary analysis of MARC-35 data were to (1) compare the microbial composition between the anterior nares and nasopharynx of infants hospitalized with bronchiolitis and (2) assess if associations between NPA microbial composition and bronchiolitis severity [21] would be replicated or enhanced using NS microbial data. We hypothesized that although systematic co-analysis of NS and NPA samples from MARC-35 participants would have shared and divergent microbial compositions, they would both be associated with severity of illness.


Study design

The 35th Multicenter Airway Research Collaboration (MARC-35) is a multicenter prospective cohort study of infants (age < 1 year) hospitalized for bronchiolitis. The study was coordinated by the Emergency Medicine Network (EMNet) [22]. MARC-35 was conducted at 17 sites across the USA during three consecutive bronchiolitis seasons (November 1 to April 30) from 2011 to 2014.

Participants consisted of infants diagnosed with bronchiolitis (as defined by the American Academy of Pediatrics) by the attending physician [23]. The exclusion criteria included previous enrollment in the study, consent to the study more than 24 h after hospitalization, transfer to a participating hospital more than 24 h after hospitalization, and known heart-lung disease, immunodeficiency, immune suppression, or gestational age less than 32 weeks. The institutional review board at each of the 17 participating hospitals approved the study, and patients were treated at the discretion of the attending physician.

Data collection

Structured interviews of the parent or guardian were performed by site investigators to determine the demographic characteristics, medical and family history, and details of the acute illness for each patient. Additional clinical details were collected via emergency department and hospital inpatient chart reviews. Reviewers at the EMNet Coordinating Center reviewed all data and inquired about discrepancies and missing data with site investigators.

Trained site investigators collected NS and NPA samples using standardized protocols [24, 25]. The site investigators collected NS samples from the anterior nares within 24 h of hospitalization. Both nares were swabbed with a single nylon, pediatric FLOQSwab (Copan, Brescia, Italy). The NS samples were placed into a vial containing 2 mL of transport media (15% glycerol in Iscove’s media) and mailed to Massachusetts General Hospital (MGH) (Boston, MA) via US mail, where they were stored at − 80°C upon receipt. NS samples were then shipped on dry ice from MGH to Baylor College of Medicine (Houston, TX) via overnight mail, where they were again stored at − 80°C. The procedure used to collect the NS emulated future parent collected nasal swabs and shipping to ensure maximum comparability of all the nasal swab specimens collected in MARC-35.

For the NPA samples, all site teams used the same collection equipment (Medline Industries, Mundelein, IL) and also collected the samples within 24 h of hospitalization. For the collection, the child was placed supine; 1 mL of normal saline was instilled into one nare, and then an 8 French suction catheter was used to remove the mucus. This procedure was performed once on each nostril. After the sample collection from both nares, 2 mL of normal saline was suctioned through the catheter to clear the tubing. Immediately after collection, the NPA sample was added to the same transport medium as the NS in a 1:1 ratio and placed on ice. Within 1 h of collection, the NPA sample was refrigerated at 4 °C. Within 24 h of collection, the sample was transferred to a − 80 °C freezer until shipped on dry ice to Baylor College of Medicine (Houston, TX), where they were again stored at − 80°C.

Microbiota community profiling

The composition of NS and NPA microbiota was characterized at the Alkek Center for Metagenomics and Microbiome Research (CMMR) at Baylor College of Medicine by sequencing the bacterial 16S rRNA gene V4 region on the Illumina MiSeq platform as described in the initial analysis of MARC-35 data. The NPA samples were additionally tested for 17 viral pathogens (e.g., rhinovirus, respiratory syncytial virus) using real-time polymerase chain reaction (PCR) assays [21]. Briefly, bacterial genomic DNA was extracted using MO BIO PowerSoil DNA Isolation Kit (MO BIO Laboratories). The 16S rDNA V4 region was amplified by PCR and sequenced in the MiSeq platform (Illumina) using the 2 × 250 bp paired-end protocol yielding pair-end reads that overlap almost completely. The primers used for amplification contain adapters for MiSeq sequencing and single-end barcodes allowing pooling and direct sequencing of PCR products [26]. Sequencing read pairs were demultiplexed based on the unique molecular barcodes, and reads were merged using USEARCH v7.0.1090 [27] allowing zero mismatches and a minimum overlap of 50 bases. Merged reads were trimmed at the first base with a Q5 quality score. We calculated the expected error after taking into account all Q-scores across all the bases of a read and the probability of an error occurring [28]. Additionally, a quality filter was applied to the resulting merged reads, and reads containing > 0.05 expected errors was discarded. Rarefaction curves of bacterial operational taxonomic units (OTUs) were constructed using sequence data for each sample to ensure coverage of the bacterial diversity present. Samples with suboptimal amounts of sequencing reads were re-sequenced to ensure that the majority of bacterial taxa were encompassed in our analyses. Positive and negative controls were included in the extraction, amplification, and sequencing processes together with the study samples for quality control and assurance purposes. The positive control consisted of a known and previously sequenced bacterial genome that is not expected to be found in the study samples. Negative controls were non-template controls composed of the reagents used in every process from sample extraction to sequencing. There was amplification on positive controls and no amplification on negative controls.

16S rRNA gene sequences were clustered into OTUs at a similarity cutoff value of 97% using the UPARSE algorithm [29]. OTUs were determined by mapping the centroids to the SILVA database [30] containing only the 16S V4 region to determine taxonomies. A custom script constructed a rarefied OTU table from the output files generated in the previous two steps for downstream analyses of alpha-diversity (e.g., Shannon index) and beta-diversity (e.g., weighted UniFrac distance matrix) [31, 32]. Shannon diversity index is a quantitative measure that takes into account not only richness but also the proportion of each bacteria (evenness) within the local community. The weighted UniFrac algorithm calculates the distance between microbial communities based on the phylogenetic relatedness of lineages and relative abundance in each sample.

Statistical analyses

The relative abundance of each OTU was calculated for each sample, and the OTUs were combined at the genus level. For each sample type, the overall abundance of each genus was calculated by taking the sum of relative abundances by genus over all subjects. The union of the 10 most abundant genera for each site defined the overall 15 top genera for comparative analyses.

To determine whether there was a significant intra-individual correlation between the two upper airway sites, a permutation test was performed on the mean Spearman correlation for the NS and NPA sample pairs. Computing the Spearman correlation matrix of the top genera abundance for NS versus NPA assessed the intra-individual correlations by genera. Due to the compositional nature of microbiota data, Spearman correlation calculations can overestimate relationships in the data [33]. To determine the significance of the co-occurrence and mutual exclusion relationships between NS and NPA, a correlation network between the two sites was constructed using the approach of Faust et al. [34]. To construct the network, all genera that appeared in at least 10% of the samples were analyzed using four different metrics: Spearman correlation, Pearson correlation, Bray-Curtis dissimilarity, and Kullback-Leibler divergence. The network includes relationships that are significant (P < 0.05) for at least two of the above methods.

For each infant, we adjoined the genus abundance measurements for the top 15 genera from the NS and NPA samples and created a composite microbiota profile. Based on this composite data, we determined common patterns of genus abundance within individuals when comparing NS and NPA sites. We then used the composite data and Bray-Curtis dissimilarity metric to cluster subjects by partitioning around medoids (PAM). The optimal number of clusters was determined using the gap statistic and average silhouette width [35, 36].

Replication analysis

To replicate the findings comparing NPA microbiota to bronchiolitis severity outcomes using the NS microbiota data [21], the NS samples were clustered separately using the weighted UniFrac distance and PAM clustering. The optimal number of clusters was determined by the average silhouette width. We then determined associations between the NS microbiota profiles and clinical covariates using chi-squared and Kruskal-Wallis tests as appropriate. Parallel to our previous analysis, a fixed-effects logistic regression model and a mixed-effects logistic regression model adjusting for 11 clinical variables were constructed using the NS clusters for each of the severity outcomes [21] (i.e., intensive care use [i.e., admission to intensive care unit and/or use of continuous positive airway pressure and/or intubation during inpatient stay, regardless of location] and hospital length of stay). For the intensive care use outcome, these models were then repeated on isolated microbiota profiles using membership in the profile of interest (e.g., Haemophilus-dominant profile versus all other subjects) as the independent variable.


Out of 921 subjects in the MARC-35 longitudinal cohort, site teams collected one NS and one NPA from 920 of these infants. We obtained optimal reads for 819 NS samples and 914 NPA samples. There were 815 subjects with optimal reads of both NS and NPA samples, and this group comprised the analytic cohort. Within this cohort, the median age at hospitalization was 3 months (IQR 2–6 months), the median weight at hospitalization was 6 kg (IQR 5–8 kg), and 31% of the infants had used antibiotics prior to hospitalization.

We found 15 unique genera for comparative analysis after examining the top 10 abundant genera from the NS and NPA (Fig. 1a). The NS microbiota was dominated by the Staphylococcus genus (40.8%). The next most abundant genera were Corynebacterium (10.4%), Moraxella (9.3%), Haemophilus (7.4%), Dolosigranulum (5.2%), Streptococcus (5.0%), and Enterobacter (4.7%), which together with Staphylococcus accounted for 80% of the NS microbiota. The NPA microbiota was dominated by Moraxella (30.7%), Streptococcus (30.5%), and Haemophilus (19.7%) genera, which comprised over 80% of the microbiota. The abundances of the dominant genera in both sample types show high variability (Table 1). Additionally, the NS microbiota showed less bacterial richness and lower Shannon diversity index scores than the NPA microbiota, corresponding to the predominance of Staphylococcus in the NS samples. However, there were several NS (6.3%) and NPA (0.5%) samples that were completely dominated by one genus, indicating that this low diversity state may be a feature of a small percentage of infant upper airway microbiota at the time of hospitalization.

Fig. 1
figure 1

Comparison of nasal swab and nasopharyngeal microbiota. The genera abundances for the nasal swab and nasopharyngeal samples were calculated by taking the sum of all samples over each genus for each sample type. Combining the top 10 genera for each sample type gave 15 unique overall top genera. a The abundances of each of the top genera were calculated for both of the sample types. b For the top 15 genera, the Spearman correlations between the two anatomic sites are shown. The correlation heat map exhibits asymmetric behavior because it is comparing between the two sample types

Table 1 Richness, alpha-diversity, and abundance by microbiota sample type

Correlations between NS and NPA microbiota

A permutation test demonstrated that the within-individual mean correlation of 0.36 between the two upper airway sites while low was significantly higher than the randomly paired sample distribution (10,000 permutations, P = 0), which had a mean of 0.28 (SD = 0.004). The Spearman correlations between the top 15 genera from the NS and NPA are shown in Fig. 1b. Haemophilus has the greatest correlation between anatomic sites (ρ = 0.50). Additionally, there is a negative correlation between all of the top genera in NS (except Bacillus) and Staphylococcus in NPA, further illustrating the differences in Staphylococcus abundance between the NS and NPA samples. The between-site correlation network demonstrates that the top five genera in the NS and NPA co-occur within individuals, including Haemophilus and Moraxella (Fig. 2). Although Staphylococcus dominates the NS, the correlation network shows that Staphylococcus in NS is only significantly correlated with Staphylococcus and Acinetobacter in NPA.

Fig. 2
figure 2

Network of significant intra-individual co-occurrence and co-exclusion associations between nasal swab and nasopharyngeal microbiota. The co-occurrence (green) and co-exclusion (red) relationships between the nasal swab (left) and nasopharyngeal (right) microbiota were assessed using four different methods: Spearman correlation, Pearson correlation, Bray-Curtis dissimilarity, and Kullback-Leibler divergence. To prevent the potential overestimation of associations that occur with the individual metrics, we retained in the network only associations between the top 15 genera that were found significant (P < 0.05) via bootstrapping for at least two of the aforementioned methods. Edges are weighted by the Spearman correlation value, with thicker lines indicating a larger correlation

Clustering the composite samples generated by adjoining the NS and NPA genus abundances for each individual demonstrated common within individual patterns between the dominant genera in the NS and NPA samples (Fig. 3). Of particular interest are the first two clusters, which show that infants with Haemophilus- and Moraxella-dominant NS samples also have Haemophilus- and Moraxella-dominant NPA samples. Additionally, the NS clusters with a high abundance of Staphylococcus correspond to high abundances of the three major genera in the NPA (i.e., Haemophilus, Moraxella, Streptococcus), indicating that high Staphylococcus samples from NS map to all of the previously identified NPA microbiota profile groups discussed below [21].

Fig. 3
figure 3

Clustering and composition of composite microbiota. For each subject in the study, adjoining the abundances of the top genera from the nasal swab and nasopharyngeal aspirate microbiotas created a composite microbiota sample. The Bray-Curtis dissimilarity was computed between each composite sample, and partitioning around medoids clustering was performed on the resulting dissimilarity values using 10 clusters. The heat map displays the abundances of the top seven genera for each of the resulting clusters, revealing common patterns of genus dominance between the nasal swab and nasopharyngeal microbiota. HAE, Haemophilus; MOR, Moraxella; STA, Staphylococcus; COR, Corynebacterium; ENT, Enterobacter; MIX, multiple genera; STR, Streptococcus

Nasal swab microbiota profiles

Using PAM clustering, we previously reported four microbiota profiles generated from the NPA samples from infants with bronchiolitis: Haemophilus-dominant, Moraxella-dominant, Streptococcus-dominant, and mixed profiles [21]. In contrast to these four NPA microbiota profiles, PAM clustering of the NS samples generated six profiles: Haemophilus-dominant (7.2%), Moraxella-dominant (13.0%), Staphylococcus-dominant (44.5%), Corynebacterium-dominant (13.4%), Enterobacter-dominant (7.5%), and mixed (14.4%) profiles (Fig. 4). This PAM analysis of the NS gives a very large and well-defined cluster of subjects with a large abundance of Staphylococcus genus. Indeed, the Staphylococcus-dominant profile consists of 78% Staphylococcus and thus shows low bacterial richness and evenness (Table 2). Moreover, the infants in the Staphylococcus-dominant profile were younger compared to other profile groups (P < 0.001), with 85% of the infants in this profile being less than 6 months of age (Table 3). The infants in the NS Haemophilus-dominant profile were older (median 5 months; IQR 3–8 months), had higher weight (median 7 kg; IQR 6–8 kg), and were more likely to have used antibiotics prior to hospitalization (50.8%) as compared to the other profiles (all P < 0.001) (Table 3), results which are all similar to the NPA findings [21].

Fig. 4
figure 4

Composition of nasal swab microbiota profiles. Partitioning around medoids was performed on the weighted UniFrac distances between only the nasal swab genus abundances (independent of the nasopharyngeal aspirate genus abundances) using six clusters. The heat map shows the abundance of the top seven genera for each cluster. HDP, Haemophilus-dominant profile; MDP, Moraxella-dominant profile; SDP, Staphylococcus-dominant profile; CDP, Corynebacterium-dominant profile; EDP, Enterobacter-dominant profile; MP, mixed profile

Table 2 Richness, alpha-diversity, and abundance by nasal swab microbiota profile
Table 3 Characteristics and clinical presentation of infants hospitalized for bronchiolitis by nasal swab microbiota profile

Nasal swab microbiota profiles and Bronchiolitis severity

Using the NPA data, our group previously reported that infants with a Haemophilus-dominant NPA microbial profile had increased odds of intensive care use and risk of hospital length of stay of three or more days when compared to those with a Moraxella-dominant microbiota profile [21]. In the current analysis, we were able to replicate these severity outcome associations in NS microbiota profiles, finding that subjects with the Haemophilus-dominant NS profile were more likely to have intensive care use (unadjusted OR 6.20, P < 0.001; adjusted OR 6.43, P < 0.001) and a hospital length of stay of five or more days (unadjusted OR 3.57, P = 0.004; adjusted OR 4.31, P = 0.002) (Table 4).

Table 4 Unadjusted and multivariate associations of nasal swab microbiota profiles with bronchiolitis severity outcomes

Analysis of the NS microbiota profiles found higher odds ratios than the NPA microbiota profiles for the severity outcomes. Isolating the Haemophilus and Moraxella NS profiles showed increased odds of intensive care use for the Haemophilus-dominant profile and a protective association of the Moraxella-dominant profile (Table 4), coincident with the patterns observed in NPA profiles of these samples. While the NS microbiota profiles did not retain the significant association with a hospital stay of three or more days, they were significantly associated with a length of stay of five or more days. Thus, the association between the Haemophilus-dominant profile and a longer hospital length of stay, as well as the protective nature of Moraxella for the hospital length of stay outcome, was maintained in the NS data.


In this multicenter study of 815 infants with both NS and NPA samples, we found within-individual correlations between dominant genera in NS and NPA microbiota. However, the NS and NPA samples also revealed distinct compositions, with an increased abundance of Staphylococcus in the NS microbiota. Using an independent analysis of the NS microbiota, we were able to replicate a previously published association between NPA microbiota and bronchiolitis severity outcomes [21] despite sample processing differences. Our findings indicate that the associations of Haemophilus-dominant and Moraxella-dominant profiles with clinical outcomes should be consistent between NS and NPA sample types.

There was a modest but statistically significant within-individual relationship between the bacterial microbiota of the two upper airway sites. Further analysis of these correlations revealed common mappings between NS and NPA genus abundances, particularly the Haemophilus-dominant and Moraxella-dominant NS samples that were maintained in the NPA microenvironment. Beyond these two genera, the mapping between NS and NPA demonstrated inconsistency between the sample types among the dominant genera. For example, the majority of infants with dominant Staphylococcus in the NS samples had NPA samples that were dominated by Haemophilus, Moraxella, or Streptococcus. Thus, NS samples with a high abundance of Staphylococcus alone may not be informative for the corresponding most abundant genera in the NPA. This is further supported by the correlation network (Fig. 2), which shows that Staphylococcus in NS only co-occurs with Staphylococcus and Acinetobacter in NPA. We also found that some samples—particularly NS samples dominated by Staphylococcus—were almost entirely composed of a single genus. Regardless of the dominance of Staphylococcus in NS samples, the large study size allowed ample data for reliable analysis. Despite having compositional differences, the correlations between the NS and NPA microbiota confirm that NSs are a useful clinical sample type. However, researchers should be cautious of the complications imposed by the abundance of Staphylococcus in the NS samples.

Almost half of the MARC-35 infants had a Staphylococcus-dominant profile with relatively low bacterial diversity. The difference in bacterial richness and diversity between the Staphylococcus-dominant profile and the other NS profiles raises the question of whether the high abundance of Staphylococcus, commonly found in the anterior nares [20, 37,38,39], is associated with clinical outcomes. Although Staphylococcus is often found in the anterior nares of healthy patients [20, 39], studies have found a higher abundance of Staphylococcus in the anterior nares of patients admitted to the intensive care unit [38] and adults with chronic rhinosinusitis [37], which may imply a difference in NS Staphylococcus abundance between sick and healthy individuals. Of more relevance to infants with bronchiolitis, Staphylococcus aureus has not only been shown to enhance the replication and infectivity of rhinovirus [40] and influenza [41], but also a Staphylococcus-dominant airway microbiota is associated with an increased likelihood of severe bronchiolitis [42]. However, because the 16S rRNA gene sequencing approach is insufficient to reliably resolve data at the species level, it is unclear whether the Staphylococcus OTU in our data is in fact Staphylococcus aureus or another species of Staphylococcus. While the species of Staphyloccocus in our data may determine how it impacts other dominant genera, the presence of Staphylococcus nonetheless has potential relevance for respiratory outcomes.

In the present analysis, we have demonstrated the usefulness of this easily obtained sample type when compared with NPA samples. Specifically, we were able to recapitulate the relationship between Haemophilus-dominant nasal microbiota and bronchiolitis severity [21], as well as the protective nature of Moraxella in the airway microbiota [21, 43], with NS samples among infants with bronchiolitis (Table 4). Furthermore, the NS microbiota profiles preserved associations with age, weight, and antibiotics use (Table 3). We conclude that for associations with Haemophilus and Moraxella, NS and NPA provide similar results. However, because many Haemophilus and Moraxella dominant samples in NPA were dominated by Staphylococcus in the NS samples for the same individuals, the ability to predict bronchiolitis severity outcomes may be more limited for Staphylococcus-dominant NS samples. Additionally, the associations between other microbiota dominant profiles and clinical outcomes may differ between the anterior nares and nasopharyngeal microenvironments.

Our study has some potential limitations. First, variations between the NS and NPA microbiota may have been amplified due to differences in NS and NPA sample handling (e.g., temperature at which initially stored). Studies of fecal microbiota have found that while variations in transport media can significantly alter the microbial composition of microbiota samples, differences in storage temperature should have a relatively little effect on the resulting microbial composition [44, 45]. Although immediately freezing the NS samples would have allowed a direct comparison to the NPA samples, the NS samples were collected and processed in a manner easily replicable in outpatient clinics or in homes for community-based studies [46,47,48,49,50]. Mailing NS samples has been used for viral detection [47,48,49], but this sample collection technique has not previously been used to examine the microbiota even though both viruses, and the microbiota are associated with acute bronchiolitis severity outcomes [8, 43, 51] as well as other respiratory outcomes [41]. Although viral detection is improved when nasal samples are collected from deeper within the nasopharynx than from the anterior nares [52], the more easily accessible NS samples have been useful for detecting viruses in community studies [53]. Allowing participants to provide a mailed NS sample would remove obstacles typically faced by researchers who want to examine both viral and microbial exposures from community-based samples. In this study, despite the differences in initial storage temperature of the samples, we were able to use the NS sample data to replicate the previous association between NPA microbiota and bronchiolitis severity.

In addition, as the samples in our study were obtained in the context of acute bronchiolitis hospitalization, the within-individual correspondence we observed may not be reproducible in healthy infants. However, our study demonstrates that NS microbiota data is relevant in the context of acute illness and is associated both with the microbial composition of the nasopharynx as well as with clinically relevant outcomes. Another potential limitation is that we did not address the dynamics of the microbiota at these sites. The airway microbiota changes over time via environmental exposure and the natural progression of microbial composition in the airway [8, 43, 54]. Although our findings show the associations in infant nasal microbiota, the relationship between anterior nare samples and nasopharyngeal samples in older children is unknown. Our data should facilitate further investigations into this important domain.


In this multicenter cohort study of infants hospitalized with bronchiolitis, we found modest but statistically significant intra-individual correlations between NS and NPA microbiota, especially for Haemophilus and Moraxella. Given the overlap of the microbiota structure between the sample types, we were able to use NS data to replicate the previously identified associations between NPA microbiota and severity of illness (as measured by intensive care use and hospital length of stay). Our investigation also reveals important differences between NS and NPA samples—particularly the high abundance of Staphylococcus in the NS. However, despite the variations between the NS and NPA microbiota and the differential handling of the specimens, the considerable overlap of the microbiota between the anatomic sites indicates that NS can provide robust and useful samples in young children.