Introduction

Humans constantly inhale fungal spores [1]. Their small size enables deposition in the airways with some spores even reaching the alveolus. In healthy people, with a normal lung function, spores are cleared by the lung defences [2]. However, in some patients with a long-term respiratory condition such as asthma, COPD, bronchitis or cystic fibrosis (CF), spores can evade the lung immune response and germinate to persist in the airways [3]. Changes in nutrient composition in the lungs from patients with chronic respiratory infections [4], impaired mucociliary clearance or increased mucus secretion [4, 5] might be responsible for increased fungal burdens in these patients. Additionally, fungal persistence in the lungs from patients with long-term respiratory diseases has been associated with worse disease outcome and increases the risk for the development of lethal fungal diseases such as chronic pulmonary aspergillosis [6].

The internal surface area of the human lung is approximately the size of a tennis court (ranges from 30 to 60 m2 per lung) and fungal spore inhalation is estimated to vary between 500 and 5000 daily [7]. This suggests that approximately one fungal spore is deposited per 100 cm2 region of the lung per day as a result of inhalation [8]. Ungerminated spores are likely to be rapidly cleared by the mucociliary escalator and by resident macrophages, and therefore, the levels of fungal spores in a healthy lung are extremely low [5]. Colonisation begins when spores germinate to form a stable locus of growth [9], and after 24–48 h growth a single uninucleate spore or yeast cell can form a mycelium or dispersed yeast colony containing more than 100 nuclei [10]. It is not clear how large a fungal community can be supported in the healthy lung but recent results suggest that individuals considered to have no fungal disease can harbour as many fungi as those diagnosed with overt infection [11••, 12]. Bacterial gut microbiome studies have recently shown dramatic impacts on wider human health [13], and so it is crucial to understand the composition of the lung mycobiome with respect to fungal commensal and pathogen communities to improve diagnosis of disease and to understand the effect such communities may have on health in general.

Until relatively recently, our knowledge of the lung mycobiome relied on culture-based techniques. These techniques have low sensitivity are generally slow and once the fungus has grown, a further step of species identification by using microscopy, molecular techniques or MALDI-ToF is required [14]. The low natural levels of fungus in the lungs coupled with the low sensitivity of culture-based techniques resulted in an assumption that lungs contain no fungus. Bacterial microbiome work and application of 16S sequencing demonstrated a complex bacterial community colonises the lungs [15]. Following this, next-generation sequencing (NGS) studies also confirmed the presence of a healthy lung mycobiome [16, 17••]. With NGS techniques becoming increasingly accessible, there is great potential for dissecting fungal species distribution and relative abundance in the lungs. However, such methods are complex and contain multiple stages which, without careful consideration, are prone to bias and the introduction of error (Fig. 1a). Such issues can compromise the conclusions of a mycobiome study and has limited our knowledge of the airway mycobiome. Here, we will describe our current knowledge on the impacts of the lung microbiome in chronic and fungal lung diseases and critically analyse the current methods used to study the human lung mycobiome.

Fig. 1
figure 1

a Critical steps of next-generation sequencing approaches to study the mycobiome. b Markers used for fungal speciation. Pan species taxonomy can be estimated using ribosomal DNA markers such as ITS1 and 2 or 18S. Correct speciation can often only be accomplished by including tubulin and calmodulin sequences in the phylogenetic analysis. For distinguishing between members of the same species few markers are published; however, microsatellite markers and certain genes such as CYP51 are known to vary between isolates and could be used in isolate characterisation

The Lung Mycobiome in Health and Respiratory Diseases

Inter-individual diversity amongst healthy individuals is very high measured as composition and load of the respiratory mycobiome. Ascomycota and Basidiomycota fungi are the main phyla of fungi detected in the respiratory airways from healthy individuals. The most abundant fungi belong to the genera Cladosporium, Eurotium, Penicillium and Aspergillus. Although Candida, Neosartorya, Malassezia, Hyphodontia, Kluyveromyces and Pneumocystis have also been detected [18]. Until 2012, Candida species were the dominant genus identified in oral washes [16, 17••]. However, using NGS methods, Cui et al. [19] have reported an overlapping between the mycobiome in bronchoalveolar lavage (BAL) and oral washes from healthy individuals. This suggests as with bacteria, fungi residing in the upper respiratory tract and external environment can enter the lower respiratory tract [20]. However, fungal species diversity in BAL is higher than in oral cavities from healthly individuals.

The Mycobiome and Chronic Lung Diseases

Lung Mycobiome in Asthma

Asthma is a chronic lung condition characterised by airway hyperresponsiveness leading to wheezing, breathlessness and coughing. The disease affects over 200 million people worldwide and annually results in 400,000 deaths [21,22,23]. Exposure to environmental fungal spores has been associated with worsening asthma symptoms, lung function, hospitalisations and death. One possible explanation is that long-term colonisation of atopic individuals by fungi causes fungal sensitisation resulting in constant airway stimulation leading to poorer asthma outcomes [24]. In a case-control study aiming to determine the fungal diversity of induced sputum samples from 30 asthmatic patients and 13 non-atopic controls using 18S pyrosequencing, van Woerden et al. observed samples from asthmatic patients were significantly more diverse than samples from atopic controls [25]. Additionally, Malassezia pachydermatis, previously associated with atopic dermatitis, was one of the most frequent species found in the airways from asthmatic patients [25]. Additionally, Fraczek et al. [11••] have described the mycobiome composition in BAL from asthmatic patients using ITS1 Illumina sequencing. They found levels of fungus to be highly variable between individuals, but severe asthmatics showed the highest fungal burdens compared to patients with allergic bronchopulmonary aspergillosis (ABPA) or patients with mild asthma. Remarkably, the observed differences in this study were due to the increased level of A. fumigatus complex fungi. This finding was confirmed by real-time PCR targeting the ITS regions; however, the use of a single copy gene to confirm the fungal burden would have been more suitable [11••].

Lung Mycobiome in Bronchiectasis

Bronchiectasis is a chronic inflammatory lung disease characterised by an abnormal widening of one or more airways [26]. Clinical symptoms of bronchiectasis include productive cough, fatigue, hemoptysis and infective exacerbations. In these patients, impaired mucociliary function causes mucus to pool in parts of the airways promoting fungal and bacteria persistence [27, 28]. The most common species isolated from the airways from patients with bronchitis are C. albicans (prevalence, 34%–48%) and Aspergillus species (7–24%) [27,28,29]. Other Candida species (C. parapsilosis, C. glabrata), Saccharomyces 7cereviseae, Trichosporon sp., Scedosporium or Penicillium have also been linked to bronchiectasis [27]. A recent mycobiome sequencing study reported that Candida, Penicillium and Saccharomyces are commonly identified in both, healthy controls and patients with bronchiectasis, while Aspergillus, Alternaria, Botrytis, Clavispora and Cryptococcus were associated only with bronchiectasis [30••]. The authors described geographical differences in the bronchiectasis-associated genera, with Aspergillus particularly relevant in samples derived from Asia and the remaining associated with European samples. Moreover, using species-specific qPCR, the authors identified a significant positive correlation between the abundance of A. terreus and exacerbations. Using immunological assays and other criteria which included A. terreus abundance, the authors found 75% of patients sensitised to Aspergillus and 18% with ABPA. Additionally, the abundance of Aspergillus, Penicillium and Cryptococcus increased with disease severity [30••].

Lung Mycobiome in CF

CF is a genetic disorder caused by a dysfunction of the CF transmembrane conductance regulator (CFTR) protein [31]. Absence of functional CFTR activity leads to reduced chloride secretion and deficient fluid transport by epithelial cells. This results in thick mucus secretion which facilitates pathogen adhesion and growth [31]. In early childhood, microbial load in the lungs of patients with CF is undetectable by culturing methods [32]. However, as the disease progresses, the lung microbiome of patients with CF is dominated by the bacterium Pseudomonas aeruginosa [33]. Some recent studies have reported that co-colonisation of the airways by C. albicans or A. fumigatus with P. aeruginosa lead to more exacerbations and decreased lung function [34, 35]. Other fungal species frequently detected in the airways from patients with CF include Penicillium species, non-fumigatusAspergillus species, Scedosporium, Exophiala dermatitidis and Cladosporium species [34, 36]. A positive correlation between community richness and patient health has been demonstrated using NGS [37]. Several mycobiome studies have confirmed the predominance of Candida species in the CF mycobiome. Interestingly, two studies have identified patient subsets which harbour mycobiomes containing predominantly Aspergillus section fumigati [38••, 39, 40]. However, as all studies so far have used sputum, it would be interesting to investigate if the same conclusion is made when analysing BAL. Despite the prevalence of fungal colonisation amongst patients with CF is 75% for yeast and up to 37% for filamentous fungi, the number of fungal infections in those patients is low [40, 41].

Lung Mycobiome in COPD

COPD is a heterogeneous lung condition characterised by chronic lung inflammation and airflow limitation [42]. Patients with COPD often suffer from exacerbations triggered by bacterial infections due to Streptococcus pneumoniae, Haemophilus influenza or Moraxella catarrhalis [43]. However, the use of immunosuppressant in patients with COPD and the extensive use of antibiotics mean fungi are frequently detected in the lungs of these patients [44], therefore suggesting the mycobiome as a potential causative effect for the development of COPD [45].

Aspergillus spp. are frequently isolated from the airways of COPD patients during exacerbations (16.6%) [46] and follow-up (14.1%) [47]. Interestingly, Pneumocystis jiroveccii colonisation is described in one-third of severe COPD patients leading to increased airway obstruction independent of smoking, increased inflammation and emphysema [19, 46, 48,49,50]. A mycobiome sequencing study of COPD patients during exacerbations confirmed a high abundance of Aspergillus, Candida, Phialosimplex, Penicillium, Cladosporium and Eutypella [51]. This study did not identify P. jirovecii; however, this is likely to be because the ITS1 primers used do not amplify this pathogen [11••].

Mycobiome and Fungal Lung Diseases

Our knowledge of the fungal species present in the lungs from patients with fungal lung diseases is limited to culture-based studies leading to an incomplete picture of the problem [52]. Although Aspergillus is frequently isolated from the airways of at-risk patients, not all develop fungal disease. It is estimated more than 14 million people worldwide suffer from aspergillosis [53•].

Bronchial colonization by A. fumigatus in some patients with asthma (2%) and CF (4%) drives the development of ABPA [54]. This hypersensitivity reaction to A. fumigatus antigens is characterised by immunoglobulin E production, eosinophilia, mast cell degranulation and bronchiectasis [55]. Although A. fumigatus is the main causative agent, other taxa such as Basidiomycota are also present [11••]. Cryptococcus neoformans and Scedosporium apiospermum are also associated with a similar clinical disease referred to as allergic bronchopulmonary mycoses [21].

Chronic pulmonary aspergillosis (CPA) is a slowly progressive disease caused by a persistent fungal infection of the lungs in some patients with a previous lung disease such as COPD, prior tuberculosis infection, sarcoidosis or lung cancer [56]. In those patients, A. fumigatus can grow inside pre-existing lung cavities and damage the surrounding parenchyma [57]. A. fumigatus is the most common species associated with CPA although A. niger, A. flavus, A. terreus and A. nidulans have also been implicated [58]. However, there are currently no studies comparing the mycobiome composition between patients with CPA and at-risk populations using NGS.

In recent years, there have been reports of a new clinical form of aspergillosis called Aspergillus bronchitis which affects 9% of patients with CF. [59, 60] In these patients, Aspergillus grows in the upper airways and mycelia form small masses which can be expectorated. The fungus does not invade the lung tissue but mucosa alteration may occur. While A. fumigatus is the most common species, A. niger, A. terreus [60] and A. flavus can also be involved [61].

Our knowledge of the respiratory mycobiome in invasive disease is poor despite the high associated mortality [53•]. Aspergillus colonisation of the airways is a poor prognostic indicator of invasive disease in severely immunosuppressed patients [61,62,63,64]. Additionally, some of those patients are frequently carriers of C. albicans in the oral wash due to immunosuppression [65].

Limiting Factors of Sample Processing in Mycobiome Studies

The environments of the upper and lower respiratory tracts (URT, LRT) differ significantly and the mycobiomes of these sites are reported to be distinct [16, 19]. However, accurately sampling the LRT is difficult, as samples can often be contaminated by the URT during the process. BAL may provide the most accurate sample type for the LRT, with the lowest chance of URT contamination. However, the bronchoscopy procedure is invasive and less commonly performed. For this reason, most mycobiome studies to date have studied sputum samples.

The DNA extraction method is critical for mycobiome studies. Fredricks et al. [66] have demonstrated that bead beating and enzyme-based methods are often biased towards filamentous fungi and yeast, respectively. Therefore, it is difficult to generate a DNA extract which accurately represents the fungal community. Moreover, research is now beginning to simultaneously assess the micro- and mycobiome of human body sites. Therefore, it is vital that such analyses choose an appropriate and consistent DNA extraction method which will provide efficient extraction from all cell types and comparable extraction efficiency from different sample types. Nyugen et al. performed 16S and internal transcribed spacer (ITS) sequencing and identified Candida as the most predominant fungal genera in CF sputum samples [67]. However, the enzyme-based DNA extraction protocol they used may have favoured the extraction of yeast compared to fungi.

Reproducibility in Mycobiome Research

A high person-to-person variation in mycobiome composition has been reported. Few studies to date have analysed samples in replicates and most only include technical replicates at the PCR stage [37, 67]. Bittinger et al. [20] performed repeat extractions from samples and found poor reproducibility between replicates, with instances of high abundance of an operational taxonomic unit (OTU) in one replicate and absence in another. This is likely a mycobiome specific issue as they did not observe such variation between replicates when performing bacterial16S sequencing of lung samples [16]. Therefore, it may be that the apparent variability of the human mycobiome is not a true biological observation and is instead due to the poor reproducibility of mycobiome methods. One possible contributing factor is that DNA extraction from fungi is more difficult, requiring mechanical disruption or enzymatic lysis to break their robust cell walls. Without optimisation, such methods can cause DNA degradation and reduce yield and PCR efficiency [18, 68•], potentially producing an inconsistent fungal community from sample to sample. Inclusion of large numbers of replicates has often been limited for financial reasons or due to the scarcity of sample material. Sequencing costs are now much lower and careful study design should help to ameliorate the risks of sample to sample variation. Further investigation into the reproducibility of mycobiome biological and technical methods is required and future studies should sequence replicates to gain a more accurate representation of a community within a body site.

Contamination Is Particularly Problematic for Mycobiome Studies

Contamination has the potential to be a significant problem in mycobiome determination. It has been established that fungal DNA can often contaminate PCR reagents [69•] and, as fungi can constitute up to 11% of airborne fine particles [70], they can readily contaminate samples during preparation. Including negative controls in the experiment can help to uncover such contamination; however, they are often omitted. Bittinger et al. [71] assessed possible contamination sources when analysing the respiratory mycobiome, including water, sterile swabs, lab surfaces and bronchoscope suction channel. Alarmingly, they identified no significant difference between the fungal composition of BAL and contamination controls from healthy individuals. To further assess this finding, the authors used PCR product concentrations to correct OTU abundance. Although significantly different, the yield of negative controls was within an order of magnitude of BAL yield, suggesting that a considerable proportion of OTUs identified from BAL were likely to arise from contamination. They used these findings to define a global abundance threshold over which an OTU is unlikely to be a contaminant [71]. Such an approach may also help to distinguish between transient organisms and colonisers of the lung.

NGS amplicon library preparation is prone to carry-over PCR contamination. Laboratories should consider implementing a workflow as used in clinical laboratories (described by the international organisation for standardisation document 24276-2006). Such workflows include physical separation of pre-PCR and post-PCR steps of a protocol, with no crossover of reagents or equipment. In addition, performing library preparation steps (particularly pre-PCR) in a laminar flow cabinet which has been cleared of surface DNA contamination is advised.

Another notable problem which can occur during a high-throughput sequencing experiment is sample barcode cross-talk. This refers to a sequence read which contains an incorrect index derived from a different sample within the library pool [72]. Index miss assignment can be introduced at the oligo synthesis and processing stages, during PCR steps or even on-board the sequencing instrument. A method to significantly reduce the impact of these events is to use unique, dual barcodes during library preparation [72].

Sequencing Methodologies

In a mycobiome workflow, amplicons are generated from samples using primers with broad fungal specificity (see Fig. 1a). Studies thus far have targeted ribosomal RNA genes or the ITS regions. However, as these targets are present in variable copy numbers in fungal genomes, it may hinder the ability of sequencing to provide an accurate, quantitative representation of a community [73]. For example, rDNA copy numbers can vary 13-fold between isolates within a population of A. fumigatus strains [74,75,76]. Furthermore, there can be inter- and intra-species sequence variation which can complicate data analysis and identification to the species level [75, 76]. Nevertheless, these targets are commonly used in the literature (see Table 1), particularly ITS1, which was suggested as the universal barcoding target for fungi [82]. A few studies have analysed the accuracy of ITS1 sequencing using mock communities and found significant bias for some groups [81, 83, 84]. Variation in amplicon size may contribute to such bias, as the commonly used ITS1 amplicon is known to contain an intron in some fungi. For example, underrepresentation of Candida glabrata has been observed and is likely due to its 420 bp intron-containing amplicon being amplified with relatively less efficiency during PCR. In addition, primer coverage is not complete [85, 86]. For example, one study failed to identify Sporothrix schenckii and Rhizomucor pusillus when using ITS1 due to a lack of primer compatibility. Lastly, species resolution using the ITS1 amplicon is poor within some fungal groups, such as Aspergilli sections, Fusarium and Pneumocystis [87].

Table 1 Next-generation sequencing studies of the human lung mycobiome

Hoggard et al. assessed the use of four amplicon targets (ITS1, ITS2, SSU and LSU) with mock communities consisting of known members of the human mycobiota [83]. This analysis found an overrepresentation of Candida albicans and Trichosporon dermatis with all targets. The authors also analysed sinonasal samples and found the ITS targets to be the most accurate. However, there were significant differences in the communities identified, with over 60% of total taxa identified by only ITS1 or ITS2. A recent analysis of mock communities of 53 fungal species, including several known respiratory pathogens, demonstrated ITS1 to provide a more accurate representation compared to ITS2 [81]. However, many species were under- or over-represented compared to the expected community using either target. One option is to combine an ITS target with a secondary barcode which can provide improved resolution, such as translation elongation factor 1-alpha and betatubulin [85](Fig. 1b). Moreover, targeting a single copy gene may provide more accurate quantification of a community, and there is a need for this with respect to the diseased mycobiome. The mere presence of a species is not suggestive of clinical relevance, and normalisation procedures inherent in amplicon sequencing library preparation results in a loss of important fungal burden information. Although this lack of data can be supplemented with qPCR analysis, this is not feasible for providing a complete, quantitative picture of a fungal community. The abundance corrected method introduced by Bittinger et al. [71] moves toward gaining a more informative analysis. Moreover, spike-in controls have been suggested as a method to improve the accuracy of ITS sequencing data; however, the authors note that due to the limitations of ITS, researchers should not solely rely on read numbers to determine relative abundance [88]. Therefore, variation in the copy number of ribosomal RNA regions discussed earlier remains a confounding factor in mycobiome studies.

Sequencing Platforms

There are now several NGS platforms which can be used for mycobiome analysis. Motooka et al. [89] compared the PacBio, MiSeq and IonPGM instruments when analysing mock communities using ITS1. The authors demonstrated the relatively poor performance of an IonPGM and highlighted that the level of quality trimming applied to reads can significantly alter results obtained using this instrument. Therefore, it is particularly important for studies using this technology to clearly report on the quality trimming applied to data. Moreover, we have observed (unpublished data) that the IonPGM produces truncated reads when sequencing the ITS1 amplicon of Penicillium chrysogenum and Aspergillus species. This may be due to a GC-rich region which hinders polymerase progression [19]. These issues may have had some influence on the lack of Aspergillus species identified in respiratory mycobiome studies using this instrument [19]. Illumina sequencers have become a popular choice for mycobiome studies (Table 1), and the particularly accurate performance of the Illumina MiSeq has been highlighted [89, 90]. A mycobiome protocol based on the well-established 16S equivalent using Nextera XT indexes is now provided by Illumina, making the approach easily accessible. However, this protocol does not recommend introducing linkers into primers for the initial ITS PCR as some adaptor regions could be homologous to some ITS targets [91].

Bioinformatic Analysis

Another significant hurdle during a mycobiome study is data analysis. Firstly, database choice is a crucial factor. There are several publicly available databases of fungal sequences, including UNITE and ISHAM Barcoding databases [92]. However, although often curated, they can be incomplete and contain inconsistencies due to taxonomic reassignments. For example, analysing mock communities using ITS1 resulted in over 15% of sequences being assigned as ‘unclassified’ when using the UNITE database [83]. In addition, 85% of taxa within UNITE belong to the Dikarya, causing a bias toward this sub-kingdom when performing the taxonomic assignment of mycobiome data [18]. Secondly, the method of data analysis significantly influences the community composition outcome. The default settings of common software packages used for mycobiome analysis (such as QIIME and Mothur) have been found to produce inaccurate species-level representations of an in the silico mock community. The authors found that a closed-reference QIIME analysis gave the best performance, correctly assigning over 70% of sequences at the species level. This still leaves room for considerable error and it is suggested that performing manual BLAST analyses can improve robustness of the classification [68•].

Conclusions

NGS studies of the respiratory mycobiome are limited and there is significant heterogeneity in the methods used. There are clear differences in sample type, extraction method, amplicon target, sequencing methodology and data analysis, all of which introduce considerable variation. In addition, mycobiome workflows are particularly prone to contamination and bias which calls for strict procedures that are not yet universally adopted.

The studies performed to date indicate that fungal diversity is higher in healthy people while Aspergillus and Candida species are the most abundant genus in patients with chronic respiratory diseases and fungal diseases [11••, 17••, 23, 38••, 80••]. However, there is a need for standardisation of mycobiome methods in order to gain a more complete picture of the fungal communities within healthy and diseased lungs. It is important to highlight that even though fungi are present in the airways from patients with chronic pulmonary diseases, not all of them develop fungal disease and this might be linked to the presence of other factors such as environmental exposure, genetics or drug treatment.

Overall, mycobiome studies are clearly limited by several factors such as (1) lack of standardisation; (2) absence of an amplicon target that allows for species identification and accurate community representation; (3) small and heterogeneous patient study populations; and (4) the evaluation of different samples such as oral washes, sputum or BAL.