Trends in yeast diversity discovery

Yeasts, usually defined as unicellular fungi, occur in various fungal lineages. Hence, they are not a taxonomic unit, but rather represent a fungal lifestyle shared by several unrelated lineages. Although the discovery of new yeast species occurs at an increasing speed, at the current rate it will likely take hundreds of years, if ever, before they will all be documented. Many parts of the earth, including many threatened habitats, remain unsampled for yeasts and many others are only superficially studied. Cold habitats, such as glaciers, are home to a specific community of cold-adapted yeasts, and, hence, there is some urgency to study such environments at locations where they might disappear soon due to anthropogenic climate change. The same is true for yeast communities in various natural forests that are impacted by deforestation and forest conversion. Many countries of the so-called Global South have not been sampled for yeasts, despite their economic promise. However, extensive research activity in Asia, especially China, has yielded many taxonomic novelties. Comparative genomics studies have demonstrated the presence of yeast species with a hybrid origin, many of them isolated from clinical or industrial environments. DNA-metabarcoding studies have demonstrated the prevalence, and in some cases dominance, of yeast species in soils and marine waters worldwide, including some surprising distributions, such as the unexpected and likely common presence of Malassezia yeasts in marine habitats.


Introduction
Within the kingdom Fungi, yeasts (Dikarya, Ascomycota and Basidiomycota) are not precisely defined. The latest 5th edition of 'The Yeasts, a Taxonomic Study' (TYTS) defines yeasts as follows '…, yeasts, whether ascomycetes or basidiomycetes, are generally characterized by budding or fission as the primary means of asexual production and have sexual states that are not enclosed in fruiting bodies' (Kurtzman et al. 2011b). These authors, however, also indicated several exceptions and commented on the imprecise border between yeasts and the dimorphic filamentous fungi that form yeast-like stages as well as yeast lineages that display strictly filamentous growth. From the above, it is clear that yeasts occur in division Ascomycota, mainly in subdivision Saccharomycotina (so-called budding yeasts), and Taphrinomycotina (that also includes so-called fission yeasts), as well as in three subdivisions of Basidiomycota, namely Ustilaginomycotina, Pucciniomycotina, and Agaricomycotina (Li et al. 2021). Some asexual and sexual reproducing structures of different yeast species are shown in Fig. 1. Although most yeast and yeast-like species do not form a complex macroscopic multicellular fruiting body, exceptions do exist. For example, among dimorphic fungi in Basidiomycota, some jelly fungi (e.g., Tremella, Phaeotremella, and Sirobasidium) form conspicuous fruiting bodies, but also have a distinct cryptococcoid budding yeast stage. Similarly, some genera belonging to Saccharomycotina do form extensive hyphae, e.g., species of the genera Eremothecium, Geotrichum, and Hyphopichia. Throughout this work we follow the rather pragmatic definition given in the 5th edition of TYTS.
During the last two centuries yeast diversity has been studied extensively (Boekhout 2005 Kurtzman et al. 2015;Péter et al. 2017). Yeasts are involved in many applied fields, such as brewing, baking, wine making, distilling, as well as in many other conventional and non-conventional fermentations. Several yeasts are among the most widely used model species in biomedical research, e.g., Saccharomyces cerevisiae Meyen ex E.C. Hansen and Schizosaccharomyces pombe Lindner, while others cause superficial or invasive infections of man and animals, e.g., Candida albicans (C.P. Robin) Berkhout and Cryptococcus neoformans (San Felice) Vuill. Historically, most investigations of yeast diversity were conducted in affluent countries in the northern hemisphere, but due to a shifting global economy, and wider availability of technologies, this pattern in biodiversity exploration is changing (Kurtzman et al. 2015, Fig. 2). Recently, countries in Asia, (particularly China and Thailand), and South America, (such as Brazil and Argentina) have led significant research to discover the local yeast biodiversity (see below). Despite recent progress, many geographical areas and biomes on earth remain almost unexplored (Kurtzman et al. 2015;Yurkov 2017. The rapid growth of our knowledge of yeasts species diversity originates from both discovery of novel species in nature and the application of more advanced identification techniques. Because of the application of more sophisticated nutritional tests, biochemical and molecular characterizations, and the use of DNA barcode sequencing, earlier described species were either lumped or split . Taxonomic novelties can be discovered from previously isolated and preserved yeast strains from culture collections. Some of the latter maintain strains isolated by pioneers of yeast ecology such as L.J. Wickerham, J.P. van der Walt, and H.J. Phaff, who were active in the 1940s-1960s (e.g., Boundy-Mills et al. 2016).
Over the past 20 years, several underexplored habitats have been extensively investigated, resulting in a considerable number of new species. Among other sources, many yeasts were isolated from fungivorous beetles and nitidulid beetles associated with ephemeral flowers (reviewed in Lachance 2006;Blackwell 2017), and recent studies focused on soil yeasts, reported up to 30% of potential novel species (reviewed in Yurkov 2018). Geographically, yeast biodiversity assessments covered a broad range, from polar regions to the tropics, but many areas and ecosystems are still unexplored (Fig. 2, see below).
The dimension and origin of yeast diversity can be assessed from literature, nomenclature, and sequence databases. Each approach has its own limitation. It is difficult to perform a reliable diversity estimation from the literature for several reasons. First, scientific literature is not universally accessible for researchers, and taxonomic publications on yeasts outnumber ecological ones. Second, publications focusing on taxonomic novelties are often incomplete. Publications with descriptions of novel ever, be kept in mind that local collections hold yeast diversity from their own locales. The isolates from South Africa were largely made by the late J.P. van der Walt and collaborators species do not necessarily report the sampling strategy, effort, abiotic and biotic ecological parameters and, sometimes, even lack the name of the sampled animal or plant, and, therefore, it is difficult to make inferences about ecology or biodiversity using such taxonomy papers.
Nucleotide sequence databases have accumulated an extraordinarily large number of records of fungal strains and sequences from cultures and from the environment (Lücking et al. 2020) but are problematic as they contain mis-identifications or lack taxonomic annotations altogether (Stavrou et al. 2018). Although sequence repositories provide annotation tools (tags and source modifiers), sequence records are often not accompanied by any ecologically meaningful data, such as the substrate, details on the applied methodology and detected number of individuals. The description of new species from environmental DNA libraries is contentious, and most likely, major improvements of all aspects of methodology are required to satisfy commonly agreed quality standards . The International Code of Nomenclature for algae, fungi, and plants (ICNafp, Article F5, Chapter F in the Shenzhen Code, Turland et al. 2018;May et al. 2019) requires registration of nomenclatural novelties applied to organisms treated as fungi and published on or after 1 January 2013 in one of the three main repositories (viz., Fungal Names, Index Fungorum, MycoBank). Earlier records are being added to these databases by curators of these repositories. Such records provide a good overview on the progress in new species discoveries, though there is still a lack of information on the ecological and community-related background. As a follow up of the printed version of TYTS, an online tool, theyeasts.org (https:// theye asts. org), is currently under development to document taxonomic yeast diversity as well as aspects of ecology, biotechnology applications, pathogenicity etc. (Boekhout et al. 2021b).
In this paper, we present (1) Current knowledge on yeast diversity, with emphasis on those species described during the last 20 years and the substrates and global regions from where these yeasts were isolated, (2) Estimates on the number of yeast species that might exist, (3) The impact of hybrid species, and (4) Some cases to illustrate how both non-culturable and culturable approaches contribute to increase our knowledge on the extent of yeast diversity. For the latter, we show how metabarcoding contributes to our knowledge of yeast biodiversity in soils and marine waters, as highlighted by the extent of Malassezia yeasts from marine sources, our limited knowledge on yeasts from rapidly disappearing cold habitats, and also the rapidly increasing knowledge on yeast diversity from Asia, here illustrated by China.

Trends in the exploration of yeast diversity: lessons from the past
For about 70 years, the taxonomic compendium TYTS saw five editions published providing hold views on yeast diversity and how to classify them (Lodder and Kreger-van Rij 1952;Lodder 1970;Kreger-van Rij 1984;Kurtzman and Fell 1998;Kurtzman et al. 2011a). The number of known and accepted yeasts species described in the twentieth century grew steadily, reaching > 700 and > 1300 species in the 4th and 5th editions, respectively (Kurtzman and Fell 1998;Kurtzman et al. 2011a, see Table 1, Fig. 3). As an example, the 1st edition contained a total of 166 species divided in the following categories: 15 genera of ascomycetous yeasts (Endomycetaceae), four genera of basidiomycetous yeasts that form ballistoconidia (Sporobolomycetaceae) and nine genera of asporogenous yeasts (Cryptococcaceae) (Lodder and Kreger-van Rij 1952). In contrast, 1301 species were described in the 5th edition of TYTS belonging to 186 genera that were either asexual or sexually defined within Ascomycota and Basidiomycota, respectively (Kurtzman et al. 2011a). We analyzed how our knowledge on yeast diversity developed over time with respect to the broader phylogenetic and reproductive aspects, i.e., whether they were asexually or sexually defined (Table 1). The first edition of TYTS had only ascomycetous genera with a known sexual state, and the remaining asexual yeasts were classified in asexual lineages, including Cryptococcaceae and Sporobolomycetaceae. In the 2nd edition, the first basidiomycetous yeasts were included (Lodder 1970), and the sexual basidiomycetous yeast genera Leucosporidium and Rhodosporidium were classified in order Ustilaginales, whereas the genera Bullera, Sporidiobolus and Sporobolomyces were placed in Sporobolomycetaceae. The 3rd and 4th editions showed a steady increase of both species and genera (Kreger-van Rij 1984;Kurtzman and Fell 1998). In the 4th and 5th editions a distinction was made between asexual and sexual genera in both Ascomycota and Basidiomycota. Many sexual genera had an asexual morph according to the Art. 59 of the International Code of Botanical Nomenclature (i.e., ICN Vienna Code 2006) (McNeill et al. 2006;Hawksworth et al. 2011). Several asexual genera could be linked to various classes, e.g., Candida (Saccharomycetes), Cryptococcus (Tremellomycetes), Pseudozyma (Ustilaginomycetes), Rhodotorula (Cystobasidiomycetes and Microbotryomycetes), and Tilletiopsis (Exobasidiomycetes). The last edition of TYTS was published in 2011, and, thereafter, the community of researchers exploring yeast diversity moved towards the development of an open access, electronic tool to document yeast diversity, namely theyeasts.org (https:// theye asts. org; Boekhout et al. 2021b). The aim of this platform is to become an ever-updatable database documenting yeast diversity.
Since 2011 (McNeill et al. 2012) the ICNafp has adapted the 'One fungus = One name' nomenclatural principle, and the difference between asexual and sexual names was abandoned. Presently, only one name is allowed for pleomorphic fungi, and, in the case of yeasts having sexual and asexual morphs, decisions on the preferred generic name still have to be implemented in several genera, e.g., Dekkera vs. Brettanomyces, Geotrichum vs. Galactomyces, and Kloeckera vs. Hanseniaspora. Some other changes were made with e.g., Cryptococcus being preferred over Filobasidiella (Hagen et al. 2015;Liu et al. 2015b). As now species with and without a known sexual morph can be classified in the same genus, the number of genera containing both sexually and asexually described species increased considerably (Table 1). Presently, these genera constitute 12 ascomycetous genera that For about 10 years, the microbial ecology of natural environments and the distribution of microorganisms in humanrelated habitats (such as fermented food, contaminated soil and water, clinical sources) and natural ecosystems, such as soil and fresh water and seawater, have been investigated using metabolomics or metabarcoding approaches, enabling insights into the uncultivated biodiversity ; see also below). Nevertheless, researchers continue to isolate microorganisms as isolates are currently the only valid way to describe new yeast species. These efforts are useful to preserve microbial biodiversity ex situ allowing subsequent species discovery and cultured microbial strains also have the potential for monetization (Overmann et al. 2017). Perhaps because the subsequent isolation of a species does not merit a publication, studies reporting the isolation of new taxa outnumber broader culture-based biodiversity surveys, which hampers the growth of knowledge on the (aut)ecology and biogeography of yeasts. Many yeast species are only known from a limited number of isolates and are often described based on a single strain (Yurkov 2017;Kachalkin et al. 2019).
A few available estimates suggest that only a small fraction (approximately 5-10% depending on the habitat) of the total diversity of fungi is known (e.g., Hawksworth 2001;Blackwell 2011). Most likely, the same is true for yeasts. Yeast species discovery continues at an increasing pace. Boekhout (2005) and Kurtzman et al. (2015) published overviews on the advances made at that time with respect to our knowledge on yeast diversity and taxonomy, and how this contributed to their biotechnological potential. Here we add the species described since 2000 (see below) and reanalyzed the data using 20-year periods (Fig. 3). The 3500 published yeast species names belong to roughly 2000-2200 currently accepted yeast species, and, several names are presently considered as heterotypic synonyms. Some species include many synonyms, e.g., S. cerevisiae with > 80 synonyms and C. albicans with > 170 synonyms (Lachance et al. 2011;Vaughan Martini and Martini 2011). Since the publication of the 5th edition of TYTS in 2011 the number of genera nearly doubled from 186 to 354 (https:// theye asts. org). This is mainly due to the revision of the taxonomy of basidiomycetous yeasts (Liu et al. 2015a, b;Wang et al. 2015a, b, c) in which some large and polyphyletic genera, such as Cryptococcus, Bullera, Pseudozyma, Rhodotorula, Sporobolomyces and Tilletiopsis, were redefined into smaller monophyletic lineages. Thus, since 2011 the number of new species increased approximately 1.5 times, while during the same period the number of new genera grew by a factor of 1.9. The species rich and highly polyphyletic genus Candida (Saccharomycotina) still contains approximately 300 species distributed across more than 30 phylogenetic clades that can be linked to several presently accepted genera and 17 unaffiliated clades (e.g., Lachance et al. 2011;Daniel et al. 2014) and this indicates that more large-scale taxonomic changes are to be expected in the future.
Based on the data of new species descriptions, the trend in yeast species discovery is best described by a polynomial of order 2 (formula y = 0.1175x 2 − 434.91x + 402,253, R 2 = 0.9867) (Fig. 3). Fell (2012) estimated the number of described yeast species as approximately one percent of those that might exist in nature. If correct, this implies that the current number of ca. 2000 accepted yeast species included in theyeasts.org represents only 1%, and the estimated total number of yeast species might reach 200,000. Based on the current rate of species discovery and publication of new species, we might need > 1000 years to describe them all using the above formula. It should be noted, however, that estimates on the total existing number of fungal species vary widely and range from 1 to 5 million with more conservative estimates resulting in numbers between 2.2 and 3.8 million species (Hawksworth and Lücking 2017). So far, approximately 150,000 fungal species have been described ). Extrapolating the 2000 described yeast species onto the canonical ratio between described total fungi and yeast species (150,000:2000) gives a rough forecast of 30,000-50,000 yeast species that might exist, and it will take at least 340-500 more years to document this yeast species diversity based on the above graph. Lachance (2006) used a more focused measure of potential yeast diversity in two ecological systems, namely yeast-floricolous insects and yeast-plant fluxes. These two models were selected to reflect the degree of specificity of yeast-substrate/ host associations. The subsequent estimation resulted in a predicted range of 1500 to 15,000 species for highly specific (symbiont) and generalist models, respectively. These numbers reflect the degree of specificity of yeast communities, diversity in specialized yeast-insect relationships and tree fluxes that are open to many species. Urbina and Aime (2018) conducted a similar estimate using red yeasts in the order Sporidiobolales. Using richness and diversity values, and species accumulation curves, the number of species in this order was estimated at 260 from which only 42 were described at the date of publication. This makes a ratio of 1 to 6.2 for described versus estimated species numbers. The authors commented that the 98% sequence similarity threshold used to distinguish between species is conservative and, likely, reflect a lower estimate of the total species numbers (Urbina and Aime 2018). As considerable differences may exist in evolutionary rates of the various lineages containing yeasts, it might not be possible to extrapolate this ratio for the entire yeast domain. However, if we consider this ratio between described and expected number of yeast species of Sporidiobolales a proxy, this will result in an expected number of around 12,000 yeast species that might exist, which is a much lower estimate than the ones given above. Documenting the diversity of yeasts predicted with this lowest species estimate will take ca. 175 years according to the above polynomial formula. From DNA-metabarcoding analyses presented below an estimate of 20,000 yeast species was obtained, which will take approximately 250 years of effort to describe them according if we use the polynomial formula to extrapolate.
In conclusion, a major effort is needed to describe unknown yeast species that might exist in nature. All the above estimates, although they vary widely, indicate that at least hundreds of years are needed to document all existing yeast species. It is important to improve sampling approaches and cultivation techniques and apply modern identification pipelines in order to discover and characterize the majority of existing yeast species. Estimation of sampling and cultivation efforts proved to be a useful tool leading to the discovery of new species (reviewed in Yurkov 2017). In addition to media and incubation conditions (reviewed in Boundy-Mills et al. 2006), sample pre-treatment and cultivation strategies can further improve species yield, e.g., using dilution to extinction strategies and automation for barcode analysis (Collado et al. 2007) or culturomic methods adapted for yeast that have been applied mainly for bacteria until now (Diakite et al. 2020). Such activities should preferably be distributed across different continents and ecozones to speed up the discovery of yeast species and other microbes.
There is no doubt that DNA-metabarcoding can be adapted to document the extensive and vast biodiversity of yeasts, but for practical reasons, such as clinical studies and biotechnological applications, it is essential to obtain, and study isolates themselves. Additionally, for the identification, DNA-metabarcoding still often relies on short 200-300 bp sequences reads (but see Heeger et al. 2018) which are minimally informative for phylogenetic reconstructions or species circumscriptions.
To get a meaningful estimate of yeast discovery rates, the species described since 2000 were extracted for each sub-phylum from the nomenclatural repository MycoBank (www. mycob ank. org). We retrieved original publications for these species and manually annotated the resulting table to account for several parameters.
• Type of study: applied, related to biodiversity or clinical. New species can be discovered in a biodiversity survey, as well as in the search for a potential biotechnological agent or in a clinical study. • Origin of cultures: new isolation from the environment or strains acquired from a culture collection. Culture collections keep a substantial number of living yeast fungi (Boundy-Mills et al. 2016). These holdings include well-characterised strains (e.g., type material) and isolates that may represent potential new species. New species discovered in culture collections are not rare (e.g., Kurtzman 2005Kurtzman , 2007Kurtzman et al. 2018). • Type of substrate: air, engineered, freshwater, glacial (snow, ice), and seawater, or related to animals, birds, food, fungi, humans, insects, plants, and soil. When possible, the information about the type of substrate from original publications was taken and merged into 15 larger categories, e.g., the substrates of abiotic (e.g., air, aquatic, soil) and biotic (e.g., animal, insect, plant) types. • Country and continent. When possible, the information from original publications was used. For the species that were described by analysing several isolates from different countries and/or continents, the term "various" was entered in this field. These analyses accounted exclusively for papers with new yeast descriptions. This does not reflect species endemism as we did not record subsequent isolations of the species. • Climate: arid (hot semi-arid and desert), cold (alpine, polar, high mountains), boreal, temperate, subtropical and tropical. Unless specifically indicated in the publications, the information about the locality was used to classify the climate in main climate groups following the Köppen climate classification system (Geiger 1954).
During the annotation, we identified studies which explicitly stated an applied research focus. Also, the annotation included the information about the provider country (the country that owns the genetic resource (here: yeast strain) under the Convention on Biological Diversity, CBD). To respond to growing concerns about exploitation of biodiversity resources in developed countries, we extended the annotation to include the following parameters: • Provider country: Global South or Global North.
Thereby, we were able to analyse the economical background of countries where new yeast species were discovered, namely in developing and developed countries. Rather than considering several heterogeneous parameters and indices, we used a simplified approach and followed the concept of Global South/North. Considering possibilities for doing research and access to modern technologies, China was included in the list of developed countries in our analyses. • Access to the material: In-land researchers included or made by foreigners only. To respond to the call for ethical sharing of non-monetary benefits with provider countries, we analysed whether researchers from provider countries were included in the list of authors of publications describing new yeast species.
According to the researched sources, 1193 yeast species were described in the period from 2001 to 2019. This number does not include new taxonomic combinations or discoveries of a sexual morph, even though the latter were considered as new species according to the Art. 59 (two morphs described in different genera) prior to the ICNafp Melbourne Code McNeill et al. 2012). The annual number of yeast species described varied between 33 in 2002 to 92 in 2004. Most novel yeast species were discovered in biodiversity assessments (89.5%). New species described from applied and clinical studies accounted for only 6.2% and 4.2%, respectively.

Geographic patterns of yeast distribution
Approximately 63 percent (n = 673) of the novel species described originated from warm climates (tropical, subtropical, and hot arid and semi-arid). The number of novel yeasts from tropical and subtropical climates was nearly the same, 28.9% and 29.5% (310 and 316 species, respectively), but only 47 species (4.4%) originated from arid climates. Of the 398 species (37.2%) described from cold climates, the vast majority was isolated from temperate climates (27.5%), whereas boreal and cold climate regions each yielded only 4.9% (52 and 51 species, respectively) new species. Therefore, studies from temperate, subtropical, and tropical climates accounted for most published novelties in the yeast domain. Arid, boreal, and cold climates remain undersampled.
The observed geographic distribution of species discoveries is uneven, similar to observations made before (Kurtzman et al. 2015;Yurkov 2017). Yeast species were described from 81 countries, which we grouped according to the number of species described since 2001. A total of 149 species were described from China (and 49 more from Taiwan). The USA and Thailand have been home to over 100 new yeasts (134 and 131 species, respectively), followed by Brazil (93), Japan (68), Portugal (33), and Panama (31). Other countries contributed to the global yeast diversity with fewer than 30 species described since 2001. A total of 122 species were isolated from more than one country. Neutral territories such as Antarctic and international waters, yielded 19 new yeasts species.
In the last 20 years, researchers from Asia were most active in describing yeast diversity. Nearly half of newly described species were sampled in Asia (43.2%, 480 species) and the four aforementioned Asian countries accounted for 82.7% (397 species) of the diversity described from the continent. These numbers will be even greater when the 107 species from China described in a single publication were considered (Li et al. 2020, see below). Together with the 227 and 208 species (20.5% and 18.7%) described from the Western world (Europe and North America, respectively), these three continents accounted for 82.4% of the global diversity of newly described yeast species. Out of 12.2% of total new yeasts (135 species) discovered in South America, taxonomic discoveries from Brazil strongly dominated with 68.9% of the diversity from that continent. Only 35 and 17 species were described from Africa and Antarctica, respectively. The Australasian region is the most undersampled place for yeasts with only eight species discovered since 2001. A total of 78 species were discovered in more than one continent.

Species discovery rates
The number of known yeast species included in "TYTS" was growing at a speed of approximately 10 species per year in the period 1952-1984 and 14 species per year until the publication of the fourth edition in 1998 (Lachance 2006). Application of advanced nutritional tests and early molecular approaches used from the first to the third edition of the book series, did not significantly change species discovery rates. But the early application of DNA sequencing in yeast identification and phylogenetic reconstruction increased the speed of descriptions of new species to 14 species/year in the fourth edition and 54 species per year in the fifth edition of the compendium (Kurtzman and Fell 1998;Kurtzman et al. 2011a). The average species discovery rate in the analysed period was 63 species per year. Since 2011, species were described at a rate of 60 species per year. An increase in species discovery rates (annual values) occurred in 2004, 2008, 2010-2013 and 2019. The first peak was associated with growing discoveries from North America, Asia and Europe, whereas later peaks resulted solely from increased research activity in Asia. Lowest discovery rate numbers were observed in 2002 and 2014 (33 and 34 species, respectively). Both low points correlated with the decreasing numbers of new yeasts described from North America. The second minimum resulted from globally declining descriptions in Asia, Europe and the Americas in the years 2013 and 2014. However, this trend has changed due to increasing research activities in Asia, Europe and South America. Starting from 2019, the number of newly described species is rapidly growing, and we expect the growth will continue in 2020 considering that more than 100 new species from China were described in a single study ). The number of species described from North America declined from 2005, while South American discoveries slightly increased in the very same period. Starting from 2015 new species discoveries from Europe increased reaching the highest recorded value of 24 species in 2019.
Among commonly studied types of substrates, most species described in the last two decades were found on plant material (41.6%) or in association with insects (17.1%). Following the two biotic substrates, soils were the most promising substrate for species discoveries (11.6%). Yeasts from food, humans, aquatic (both freshwater and marine), and animal sources accounted for between 1.9 and 5.2% of the taxonomic novelties.

Substrates as source of novel yeasts
Plant substrates were the most frequent source of new species in Global North and South countries. From 2010 onward, the numbers of descriptions of yeasts from this substrate increased in countries of the Global South, from just 6 (in 2010) to 24 (in 2013). Among abiotic substrates, soil was mainly studied in countries of the Global North, while soil yeasts in many developing countries are understudied (see also . Two biotic types of substrates, viz., plants and insects, were the primary source of new species from countries of the Global South, although new yeasts isolated from insect-related sources were largely described in 2004-2006 and their numbers have declined since then. Researchers who discovered yeasts in association with insects were also active in countries of the Global North in the same period and again later in 2016. Since 2007, new species from plant-related sources strongly prevail over isolations from other sources in the Global South.

Biodiversity hotspots and endemism
Biodiversity hotspots are defined as areas with high diversity and endemism of animals and vascular plants (Myers et al. 2000). An extrapolation of plant and animal diversity values on microorganisms is arguably possible if we consider that microbes are strictly associated with their hosts, for example as parasites and symbionts. Although endemism has been demonstrated in yeasts, there are many species that are not restricted to a particular region or substrate (reviewed in Yurkov 2017). Similarly, species discovery rates do not directly depend on the local biodiversity. While diversity of plant species in the tropical zone is undoubtedly higher than in the temperate zone, the number of novel yeasts discovered in these climates was similar, counting 310 and 295, respectively. Despite the bias in our dataset, which does not account for attempts to find new species in specific areas during the observed period, our observations suggest that the diversity of land plants is not a good predictor of undiscovered species of yeasts. The distribution patterns of yeasts in different regions and climates vary considerably (reviewed in Yurkov 2017). It has been convincingly demonstrated that yeast communities on plant surfaces (phylloplane) are comprised of both widespread cosmopolitan and endemic species (reviewed Fonseca and Inácio 2006;Kelmer et al. 2017;Limtong and Nasanit 2017). Thus, a successful isolation of a new yeast species requires an appropriate sampling effort and isolation strategy (Lachance and Starmer 1998;Yurkov and Pozo 2017). Although a few interesting plant-specific yeasts associations do exist, e.g., Dimennazyma cistialbidi Cistus albidus and members of the genus Carlosrosaea on bromeliads, to name a few, interesting yet undiscovered yeasts may be present in the environment in small numbers as minor community members (reviewed in Yurkov and Pozo 2017). Similar to phylloplane yeasts, many soil yeasts are widespread, e.g., Apiotrichum porosum Stautz, Saitozyma podzolica (Babeva & Reshetova) Xin Zhan Liu, F.Y. Bai, M. Groenew. & Boekhout, and Solicoccozyma terricola (T.A. Pedersen) Yurkov (reviewed in Yurkov 2018; see also below). However, a few recent studies reported a high proportion of potential novel yeast species from soils . Whether this trend is restricted to forest soils in the temperate and Mediterranean climate zones needs further research. New species from tropical soils are still rare due to insufficient sampling activities (reviewed in . Species of clinical relevance (isolated from humans) were more often described in the Global North than in Global South countries, namely 25 versus 8, respectively. Likely, this is a result of better research facilities and possibly awareness in the former area.

Impact of researchers on yeast species discovery
The role of individual researchers, such as their places of work and preferences for a particular type of substrate, was visualized using species records in MycoBank. We analysed the origin of new discoveries made by 25 most frequent authors of new species. Marc-André Lachance, Takashi Nakase, and Carlos Rosa (and their collaborators)  (Newman 2006), and shown in different colors. Nodes sizes show number of described species per author. Authors with less than 5 contributions in the respective time period have been excluded from calculation. Authors without connections to the large network have been excluded from the network-layout. (b) Flow Chart shows contributions to new yeast species-descritpions per year of the calculated author-clusters. The author with the most contribution per cluster is named, together with total number of contributions of the whole cluster between 2001 and 2019. The colors correspond to groups in the network. Width of lines correspond to species described by the cluster, isolated from substrate (2nd block), described from economoyzone (3rd block) and Origin (4th block) have each described more than 100 species of yeasts in the last two decades; twelve more authors described more than 50 species.
A network analysis was used to display the scientific cooperations that yielded the majority of species during the last two decades (Fig. 4). In many cases these research collaborations were international. Plant-related substrates were sampled by nearly all leading research groups. Insect-related sources, including insect frass and tunnels made by woodboring insects, were mostly surveyed by researchers forming three network clusters, largely built by scientists from Brazil, Canada, Japan, Thailand, and USA. These working groups studied equally well plant and insect-related yeasts in countries of the Global North and Global South. As it has been mentioned above, soils were more intensively studied in the Global North by researchers from Germany and Portugal. Most discoveries of new species from humanrelated, including clinical, sources were made by researchers from one cluster, which is built by scientists from Japan and Thailand.

Challenges for yeast biodiversity exploration
Biodiversity loss due to overexploitation is a growing concern in many countries. Many sovereign states established mechanisms for conservation of their genetic resources, including microorganisms, through their national legislations and multilateral agreements like the Convention on Biological Diversity (CBD), the Nagoya Protocol (NP) and some other treaties. Both CBD and NP have created new challenges for international microbiological research. One of the major goals of the CBD is to link traditional conservation efforts to economic development using biological resources. The NP entered into force on 12 October 2014 and has dramatically changed biological research by creating a new and complex set of administrative and legal hurdles for researchers studying biodiversity, both preserved ex situ and newly isolated from their natural habitats (Boundy-Mills et al. 2016;Yurkov et al. 2019). Bioprospecting, which offers companies or individual scientists benefits from the biological resources found around the world, has long been a concern of developing nations with a rich biodiversity (Artuso 2002;Hamilton 2006;De Jonge 2011). The CBD and NP do not distinguish between basic and applied research, or between commercial and non-commercial use. Therefore, nearly all routine microbiological biodiversity research, dependent on environmental sampling, isolation, identification and subsequent taxonomic classification, is covered by both CBD and NP depending on the national legislation where and when access (sampling) took place . Isolation of microorganisms, including yeasts, and their appropriate characterisation for taxonomic purposes lies in the borderline between biodiversity and bioprospecting. The extent of research on yeast products (enzymes, pigments, secondary metabolites) and applications (biocontrol, fermentation) using newly sampled isolates compared to biodiversity research is largely unknown, but likely quite extensive. Some examples of recently described yeast species that found applications are e.g., Spathaspora spp. for biofuel and Saccharomyces eubayanus Samp., Libkind, Hittinger, P. Gonçalves, Valério, C. Gonçalves, Dover & Johnst. for brewing. But even biodiversity research on yeasts, such as a taxonomic inventory or description of a new species, though this is the essential part of the CBD, is regulated by national CBD and NP-CBD legislations. The CBD and NP have a temporal scope (a ratification date) but some countries extend the regulations on 'old' resources in ex-situ collections. As such it is not unlikely that some old strains preserved outside their country of origin may be subjected to a national legislation (see also Yurkov et al. 2019;Aime et al. 2021). Cultures from 200 countries and regions are preserved in culture collections, which preserve biodiversity ex-situ . Some of these yeasts represent yet undescribed species. Cultures which were available from culture collections were used in 125 species descriptions in Global North countries, compared to 28 species originating from countries of the Global South.
Taxonomic research and discoveries of new species are fundamental to understand biodiversity and conservation (McNeely 2002). The CBD Global Taxonomic Initiative is aimed at minimizing the knowledge gaps in taxonomic systems, including sharing technologies for identification and subsequent description of taxonomic novelties with researchers from developing countries, also referred to as Global South. We observed several trends when comparing countries with different economies. Most novel species were described from countries of the Global North (62.2%), including China, USA, and Japan. Among countries of the Global South (31.1% of newly described species), Thailand and Brazil yielded most novel species. Yeast diversity in countries of the Global South was extensively sampled in hot, mainly tropical climate zones and less in subtropical climate zones. Research in the Global North focused on subtropical and temperate climates, e.g., in the Mediterranean forests, woodlands, and scrub biome and Temperate broadleaf and mixed forests biome (Inácio et al. 2002;Yurkov et al. 2016b).
Our analyses demonstrated that expert skills essential for identification of yeasts, including new species, are either available in most active biodiversity-rich countries of the Global South or were obtained through the established expert network (Fig. 4). The aim of access and benefit-sharing framework of the CBD is to overcome the inequality between developing and developed countries, including among others non-monetary benefits, such as technology transfer, participation in research and publications. We addressed the participation of researchers in yeast species discoveries. The majority of new species descriptions of yeasts were made by researchers in their own countries, alone or in collaboration with a few foreign experts (Fig. 4). The number of species from countries that were described solely by foreign researchers was almost the same in Global South and Global North countries, 86 and 78 respectively. These numbers combined constitute just 13.7% of the total number of described new species in the period of 2001-2019. Of them, a total of 67 (78%) and 38 (49%) descriptions were made using strains newly isolated from the environment in Global South and Global North countries, respectively. Biodiversity surveys clearly dominated over bioprospecting and clinical studies. Furthermore, most of the descriptions were made in a biodiversity survey involving inland researchers, and only a few foreign scientists discovered new species during applied or clinical studies (Suppl . Table SI1). Yeast discoveries during applied research performed by researchers in their home countries (64 species, 39%) are seven times more common than by foreign researchers (9 species, 5,5%).

General trends in the discovery of new yeast species
The analyses shown above indicate that the diversity of described yeast species is steadily growing with an almost constant rate in the years from 1998 to today. The majority of new discoveries were made in Asia, North America and Europe by local scientists or in research cooperation. We did not find evidence for unfair sharing of research results and unethical exploitation of local biodiversity in developing countries. Leading countries in the Global South, Thailand and Brazil, have capacities to perform high-quality biodiversity research independently or in cooperation with taxonomists worldwide. Most of the activity in new species discoveries was directed to tropical, subtropical and temperate climates. Arid and cold climates remain undersampled, but they represent a promising source of new yeast species. So far, most promising sources of new species included plant-and insect-related substrates, and soils to a lesser extent. Taxonomic novelties from aquatic, animal and human (including both clinical and food) sources, and yeasts isolated in association with fungi, including lichens, are not unusual, but their numbers are far behind. Plant material and insect-related habitats yielded most of the yeast species described in the last two decades. Although both types of substrates seem to have been well-sampled in the past, we expect more species discovered from either plants or insects as many areas and plant/animal species have not yet been sampled. The recent description of more than 100 yeasts from China by Li et al. (2020) increased the diversity of basidiomycetous yeasts by 20%, and most of these species were isolated from the phylloplane. Yeasts associated with insects were intensively studied during the last two decades by Blackwell, Ganter, Lachance, Rosa, Starmer, Suh and their collaborators. The range of habitats included ant gardens, flowers, mushrooms, and decaying wood, fruits and cacti. In 2005, Boekhout reported that only 6% of yeasts in the CBS collection of the Westerdijk Fungal Biodiversity Centre were from insect sources, but these numbers increased in 10 years reaching 7.25% after the insect-derived yeast collection of Meredith Blackwell was included in the CBS collection (Blackwell 2017;Groenewald et al. 2017).
Insufficient sampling intensity and geographical sampling bias compromise our knowledge of natural yeast communities. Despite overall intensive sampling of yeasts in substrates associated with insect activity, most large studies were regional, e.g. on Neotropical floricolous insects or mushroom-feeding beetles in North America (reviewed by Blackwell 2017). The high degree of specificity of yeastinsect interactions makes it likely that future studies in notyet-sampled regions or insects will yield many new yeast species. Flower nectar, a highly selective habitat, is usually inhabited by a few specialized yeasts (reviewed in Mittelbach and Vannette 2017). Nevertheless, flowers and nectar represent a more widespread and a more complicated system than rotting cacti and tree fluxes. A combination of host plant and animal vectors enhance geographic endemism of flower yeasts within this system (reviewed in Yurkov 2017). Substrate selectivity steered by plants (e.g., sugar composition and concentration, and toxic compounds), species inhibition through primary effects and syntrophy are other important factors creating diversity in this highly specific habitat (Mittelbach and Vannette 2017). Thus, sampling in a presumably well-characterised habitat can sometimes result in unexpectedly high species diversity (e.g., Mittelbach et al. 2015) and reveal species that are new to science (Passer et al. 2019).
Some species-poor substrates, like soils, are characterised by a patchy, uneven community structure, meaning that the number of species will likely increase along with the number of analysed samples. The ultimate importance of sampling strategy for biodiversity studies has been repeatedly emphasized (reviewed in Lachance and Starmer 1998;Boundy-Mills 2006;Yurkov and Pozo 2017). Although just a few species are usually isolated from a single soil sub-sample or plate, a thorough sampling of this habitat can yield as much as 100 species from beech forests located in three areas of Germany (Yurkov et al. 2016a). Research-emphasis on certain systems, substrates, or regions might be due to multiple reasons, including accessibility, interest, or grant-strategies of funding organisations. Moreover, activities of researchers or working groups relate to peaks of newly described yeast species.

Cryptic and hybrid species
The existence of cryptic and hybrid species challenges any attempt to accurately estimate species diversity. These issues have long been recognized but have definitely come to the surface with the advent of whole genome sequencing (Naranjo-Ortiz and Gabaldón 2019). The occurrence of hybridization can blur the limits of defined species by establishing gradients of genetic flow between different species and by establishing new, chimeric lineages (Mallet 2007; Morales and Dujon 2012;Leducq et al. 2016;Gabaldón 2020). Yeasts used in industrial processes were among the first microbes in which hybrids were recognized. The hybrid nature of the lager beer yeast Table 2 List of yeast genera and species comprising hybrids Species in which several strains have been analyzed and are hybrids (i.e. likely hybrid species) are highlighted in bold. For each species or genus, the scientific name, the taxonomic family and a literature source is indicated. Intervening shadowed rows indicate taxonomic subphylum and phylum for the subsequent species and genera. 1 Most (~ 80%) of the analyzed isolates are hybrids 2 Only one isolate per species has been analyzed 3 Low divergence of the two parental populations (0.6%), suggesting intra-specific hybridization Saccharomyces pastorianus Reess, for instance, was first established through DNA re-association studies (Vaughan-Martini and Martini 1987) . Later, genomics analyses have uncovered numerous hybrids in different niches, which encompass a growing number of yeast clades (Table 2). Most known hybrids have been identified among Saccharomycotina yeasts thriving in industrial environments or other human-related niches, perhaps because they are the most intensively studied yeasts. These includes Saccharomyces hybrids isolated from wine, cider, beer or other products ( (Mixão and Gabaldón 2020;Mixão et. al. 2021a). Although less explored, environmental hybrid isolates of Saccharomycotina yeasts have been described in Saccharomyces (Leducq et al. 2016) and Metschnikowia (Venkatesh et al. 2018). New hybrid strains are likely to emerge, often unexpectedly, as we continue exploring the genomic diversity of Saccharomycotina. For instance, a recent phylogeny-oriented survey of budding yeast genomes, identified Citeromyces siamensis Nagats., H. Kawas., Limtong, Mikata & Tats. Seki and Martiniozyma abiesophila (Kurtzman) Kurtzman as potential hybrids based on their genomic signatures (Shen et al. 2018). Beyond those in Saccharomycotina other hybrids in the Ascomycota include some strains of the halotolerant black yeast Hortaea werneckii (Horta) Nishim. & Miyaji (Pezizomycotina) (Gostinčar et al. 2018) or Schizosaccharomyces pombe Lindner (Taphrinomycotina) (Tusso et al. 2019). In the latter, ancestral admixture between two populations diverging ~ 0.6% at the nucleotide level (likely within the species boundaries) seem to explain most of the diversity of current populations, suggesting a scenario of ancestral intra-species hybridization rather than continuous genetic exchange. Finally, hybrid species or strains have also been identified in basidiomycetous yeasts, hitherto mostly in clades comprising human pathogens, such as the Cryptococcus neoformans/Cr. gattii complex (Agaricomycotina) (Boekhout et al. 2001;Samarasinghe and Xu 2018), trichosporonoid yeasts (Agaricomycotina) (Takashima et al. , 2019Aliyu et al. 2020), and Malassezia (Ustilaginomycotina) (Wu et al. 2015;Theelen et al. 2021).
The implications of hybridization in defining the diversity within a lineage are manifold. First, hybridization can be a means of speciation, initiating new lineages that have characteristics that are not necessarily intermediate between those of the parents and that drive the colonization of new lineages (Gabaldón 2020). Second, hybrid lineages can start recurrently, perhaps in different geographical areas by the convergent mating of the same two divergent lineages as it is the case of C. orthopsilosis (Schröder et al. 2016). Finally, hybrids can often only reproduce clonally, and their genomes are highly plastic, which results in rapid diversification into highly differentiated lineages (Gabaldón 2020). Such has been the case of C. albicans and their closely related C. africana and C. stellatoidea, considered as different or the same species depending on the authors. The three are the result of extreme diversification following a single hybridization event, with the latter two (sub)species resulting from independent, parallel and massive loss of heterozygosity events (Mixão and Gabaldón 2020;Mixão et. al. 2021a). All together hybridization, which seems a common and important phenomenon in fungi Steensels et al. 2021), promotes diversification both at the genetic and phenotypic levels, blurs species delimitations, and can be often associated with the existence of cryptic species, poorly-supported phylogenetic relationships, and species complexes.

Case studies
Microorganisms are the most diverse and abundant form of life on Earth and are present in every possible type of niche. More than 99% of potentially 10 11 -10 12 microbial species are unknown and only a small fraction has been obtained in pure culture (Locey and Lennon 2016;Bodor et al. 2020). There are several reasons why certain microorganisms cannot be cultivated using current techniques, e.g., some are scarce and slow growing, and others are demanding and need specific growth requirements or interactions with certain microbes or other organisms. The present limited availability of pure microbial cultures stimulates future studies that will generate more details of those missing species, i.e., known the unknown, and their physiological properties . DNA-metabarcoding is widely used to document the extent of microbes in many habitats, but this informs us only indirectly and partially about their metabolism. In contrast, metagenomics studies investigate the diversity of unculturable microbial life using genome data, and, hence, contribute also to our knowledge of metabolic potential (Bodor et al. 2020;Lücking et al. 2021). Here, we present two cases of metabarcoding studies that highlight how this approach contributed to our understanding of the diversity of soil-borne and marine yeasts. These two cases address the possibilities and limitations of the use of DNA-metabarcoding to estimate yeast diversity. The first case on soil yeasts yields further insight on the estimate of unknown yeasts that might occur in this habitat, and the second example address the putative extensive diversity of Malassezia species that occurs in marine waters. A third case presents the extent of cultivable yeasts present in a little studied environment, namely cold habitats such as glaciers, that may hold a promise for applications, e.g. by finding enzymes that work optimally at low temperatures. Finally, a fourth case highlights the rapid increase of the diversity of cultivable yeasts in Asia, in this case China, where many new species have been found lately.
To improve the yeast identification of the OTUs, we reclassified the OTUs from Tedersoo et al. (2014) using the yeast CBS barcode dataset in which the sequences were manually checked and updated with the taxonomic classification from MycoBank (https:// www. mycob ank. org/; Robert et al. 2013;Vu et al. 2016). More specifically, the ITS2 sequences of the CBS yeast barcode dataset were extracted using the software ITSx 1.1.1 (http:// micro biolo gy. se/ softw are/ itsx) to create a reference dataset of 4,436 sequences representing 1,242 species, 197 genera, 57 families, 24 orders, and 13 classes. When reanalyzing the data provided by Tedersoo et al. (2014) we used the same cut-off values of 0.85, 0.8, and 0.75 at the taxonomic level of family, order, and class, respectively. However, at the species and generic levels the similarity cut-offs used for Ascomycota and Basidiomycota were predicted as described in Vu et al. (2016) and Boekhout et al. (2021b) in which the minimum coverage for the ITS2 sequences was given as 100 bp. For Ascomycota, the predicted cut-offs to identify the sequences at the species and genus levels were 0.986 (with a confidence of 0.88) and 0.936 (with a confidence of 0.6), respectively. For Basidiomycota, they were 0.989 (with a confidence of 0.94) and 0.926 (with a confidence of 0.85), respectively. The analysis shows the utility of lineage-specific sequence similarity analyses compared to common arbitrary defined cutoff values. We want to emphasize that the sole use of such cut-off values for taxonomy purposes, e.g. the recognition of species, is strongly discouraged as they may vary between lineages and clades and need to be accompanied by supporting data from ecology, genetics, physiology, and so on . It is important to note that 83 (~ 6.68%) of the yeast species in the reference dataset were indistinguishable by ITS2 (the sequences of these species were in the same group with the other species when clustering the dataset with 100% similarity score), including some prominent soil yeasts   Yurkov. These species were removed when predicting a similarity cut-off for yeast species identification to obtain an optimal resolution. Yeast identification improved for the OTUs using the described reference dataset and BLASTn (Altschul et al. 1977, Fig. 5). A total of 1624 OTUs from the reference dataset were identified to belong to the yeast taxonomic classes. More specifically, 133, 385, 536, 1302, and1612 OTUs were identified at the species, genus, family, order, and class level, respectively in which 87 (65.4%), 94 (24.4%), 194 (36.2%), 775 (59.5%), and 941 (58.3%) OTUs were newly identified. Among the 46, 291, 342, 527, and 671 OTUs successfully identified by both UNITE + INSDC and CBS datasets at the species, genus, family, order, and class level, respectively, 18 (39.1%), 87 (29.9%), 161 (47.1%), 354 (67.2%), and 580 (86.4%) OTUs had the same name, and 28 (60.9%), 204 (70.1%), 181 (52.9%), 173 (32.8%), and 91 (13.6%) OTUs were updated with a new name and received a higher identification score in 28 (100%), 153 (75%), 105 (58%), 116 (67%), and 56 (61.5%) OTUs. To avoid the problem of wrong identification due to the lack of reference sequences, all OTUs that were identified with a new name and that had a lower score than the score obtained previously were removed. In the end, 1497 OTUs were selected in which 133 (9%), 331 (22%), 437 (29%), 1197 (80%) and 1487 (99.3%) OTUs were identified to 106 species, 86 genera, 42 families, 23 orders, and 11 classes. Several reasons may explain why 91% of the 1497 OTUs were not identified to the species level, namely (i) the lack of reference sequences; (ii) the used cut-off was too strict; (iii) sequencing artifacts by NGS platform are higher than those by Sanger; or (iv) they are potentially new species. If we assume that most currently accepted yeast species do have a reference sequence, and that the second and third points from above play a lesser important role, it seems that around 10% of the soil yeast species has been described. If we take this as a proxy for the entire yeast field, which is also supported by the fact that soils are a very prominent source for new yeast species (see above), this leads to an estimate of 20,000 yeast species that might exist.
OTU richness and read abundance of the selected yeast OTUs in all biotope types studied by Tedersoo et al. (2014) are presented (Fig. 6). Here richness means the number of entities in question that can be referred to OTUs, species, genera, etc., while read abundance means the number of the sequencing reads of the entities. It is interesting to see that there was a high correlation of 0.84 between the yeast OTU richness and read abundance of the biotope types. Temperate deciduous forests (TDF), southern temperate forests (STF), temperate coniferous forests (TCF), boreal forests (BF), moist tropical forests (MTF), tropical montane forests (TMF) had relatively high yeast OTU richness and read abundance (516 and 7805 for TDF, 444 and 13,734 for STF, 434 and 5360 for TCF, 384 and 5541 for BF, 346 and 6322 for MTF, 262 and 4621 for TMF), while dry tropical forests (DTF), grassland and shrubland (GS), savannas (SAV), arctic tundra (AT), mediterranean (MED) had relatively low yeast OTU richness and read abundance (83 and 794 for DTF, 94 and 498 for GS, 138 and 1201 for SAV, 169 and 3261 for AT, 182 and 1597 for MED).
Here, we present the yeast diversity present in the different biotope types at the species and genus levels ( Fig. 7). datasets. The green color shows the number of OTUs that were identified with same name by both datasets. The brown color shows the number of OTUs that were identified by the associated dataset with a different name and higher score than the other dataset. The pink color shows the number of OTUs that were identified by the associated dataset with a different name and lower score than the other dataset. The blue color shows the number of the OTUs that were only identified by the associate dataset Boekhout, the most abundant species with a read abundance of 9287 (18.3%), was found in all biotopes with an abundance from 15 in GS to 3081 in MTF. The second most abundant species, So. terricola (7699, 15.2%), was found in most of the biotopes with a relatively high abundance of more than 68 except for DTF (0), SAV (1), and MTF (11), followed by the So. terrea -So. phenolicus species complex (2274, 4.5%), found mainly in STF (816), TDF (621), MED (451), TCF (214), SAV (58), GS (53), and TMF (52). The A. porosum -A. xylopini species complex and Solicoccozyma aeria were also two abundant species revealed in the soil samples with 1456 Fig. 6 The richness and read abundance of the yeast OTUs in soils of the following biotope types arctic tundra (AT), Ggassland and shrubland (GS), dry tropical forests (DTF), mediterranean (MED), boreal forests (BF), tropical montane forests (TMF), savannas (SAV), southern temperate forests (STF), temperate coniferous forests (TCF), temperate deciduous forests (TDF), and moist tropical forests (MTF) studied by Tedersoo et al. (2014). Note the extensive differences in abundance between biotopes Fig. 7 The diversity of soil yeasts (species and genera) in all types of biotopes arctic tundra (AT), grassland and shrubland (GS), dry tropical forests (DTF), mediterranean (MED), boreal forests (BF), tropical montane forests (TMF), savannas (SAV), southern temperate for-ests (STF), temperate coniferous forests (TCF), temperate deciduous forests (TDF), and moist tropical forests (MTF) studied by Tedersoo et al. (2014). Note the large unidentified number of species and genera in all biotopes (2.9%) and 1417 (2.8%) of the abundance. The A. porosum -A. xylopini species complex formed a major group in MTF (7.5%) and TMF (5.5%), while Solicoccozyma aeria (Saito) Yurkov did so in MED (10.3%) and STF (6.2%). The remaining 101 species had low OTU abundances, i.e., < 0.78% in all biotopes.
At the genus level, the most abundant genus Solicoccozyma (12,528, 24.7%) was found in most of the studied biotopes with a relatively high abundance from 300 in AT to 5254 in STF. DTF (6), MTF (16), and SAV (66) were the low abundance exceptions. The second most abundant genus Saitozyma (9539, 18.8%) also had a relatively high abundance in most biotopes ranging from 143 in DTF to 3114 in MTF, except for GS (18). The third most abundant genus Glaciozyma (2350, 4.6%) was found mainly in STF (823) (88). The remaining 84 genera had a low proportion with an abundance < 1.6% among all biotopes.
Compositional differences among samples were visualized using non-metric multidimensional scaling (NMDS) in the vegan R package (Oksanen et al. 2020) with Bray-Curtis distance measure on the Hellinger-transformed matrix. We performed PerMANOVA (adonis) to estimate the amount of variation explained by the biome (categorical) and a range of continuous environmental variables including climatic and edaphic parameters (Fig. 8). Biome was a significant source of variation in community composition of yeasts, explaining ca. 17.3% of the variation (Table 3). In the combined model composed of continuous environmental variables, mean annual temperature (MAT) explained the largest fraction of variation (9.7%), followed by soil pH (2.2%), latitude (1.8%), C/N ratio (1.6%), and mean annual precipitation (MAP, 1.4%). Other edaphic variables explained less than 1% of the variation, although remained significant in the combined model, with the exception of Mg that was marginally significant (Table 3).
While the above results give some crude overview of the currently known richness and compositional patterns of yeasts found in soil samples collected from all major biomes, it is important to keep in mind their limitations. Several biomes are still undersampled and we lack samples from many ecoregions and habitat types in each biome. Furthermore, a large fraction of yeasts inhabit above-ground microhabitats, such as tree bark and sap, flowers, fruits and these may be present in such a low biomass in soil that many are missed by random soil sampling. In order to obtain a more complete picture of the diversity and distribution of yeasts in natural habitats, more environmental microbiome studies are needed from a wide range of microhabitats, including but not limited to the ones mentioned above.
Soil yeasts have been intensively studied in the past . A few studies reported large-scale biodiversity assessments of cultivable yeasts convincingly demonstrating that their distribution patterns are influenced by both climate and environmental parameters (Vishniac 2006;Yurkov 2017). The three widespread genera of soil yeasts, Apiotrichum, Saitozyma, and Solicoccozyma, were found in all studied types of biotopes with ITS2 metabarcoding (Fig. 7) and cultivation alike . Though species composition across biotopes and regions often showed similarities, abundance and incidence of dominating species (community structure) distinguish them. Environmental factors such as mean annual temperature, annual rainfall, and electrical conductivity could explained up to ca. 44% of the distribution of the prominent yeast species along a latitudinal gradient (Vishniac 2006). The same factors explained the largest fraction of variation of ITS2 OTUs in the analysis presented here. Cultivation-based studies revealed several indicator yeast species and extremophiles, e.g., Nadsonia starkeyi-henricii (Cif.) Kurtzman & Robnett, Sa. podzolica, So. aeria (Vishniac 2006;Yurkov et al. 2012b;Buzzini et al. 2018; (Fig. 7). Whether or not these psychrophilic species can survive in these biotopes requires further clarification. Repeated sampling, sample pre-treatments and estimations of sampling effort substantially improved species discovery rates from soils (Yurkov et al. , 2016a(Yurkov et al. , 2016b. It has been also demonstrated that a large proportion of soil yeasts represent putatively novel species. Results of ITS2 metabarcoding revealed a large proportion of sequences that were not identified to a species or genus. Whether or not these represent potential novel taxa and not divergent sequence types or artefacts requires further clarification. The two most extensively sampled forest biomes, Temperate broadleaf and mixed forests and Mediterranean forests, woodlands, and scrub, yielded around 100 yeasts (Yurkov et al. 2016a, b). Also, species richness estimations suggested that Mediterranean forests are potentially more species-rich than Central European mixed forests (Yurkov et al. 2016b). In contrast to this observation, ITS2 metabarcoding revealed higher sequence richness in temperate deciduous forests (TDF) than mediterranean biotopes (MED) (Fig. 6). Though yeasts represent probably one of the best sequenced group of Fungi, our analysis highlighted the problem of many missing reference sequences. Further bioinformatic research should be directed towards creating better datasets of reference sequences and phylogenetic Fig. 8 Compositional differences of yeast fungal communities among biomes, based on the global soil fungal dataset from Tedersoo et al. (2014) re-analyzed here using nonmetric multidimensional scaling (NMDS) on Hellinger-transformed data. Circles indicate standard deviation of withinbiome variation, while vectors indicate environmental variables significantly correlated with yeast community structure. Abbreviations MAT: mean annual temperature, MAP: mean annual precipitation, C/N: carbon and nitrogen ratio, while chemical elements are indicated by their symbols placement of potential novel species from metabarcoding libraries (e.g., Mašínová et al. 2017).

Case 2. The extent of yeasts occurring in marine waters: The Malassezia case
Yeasts have been documented from marine waters and other substrates (Fell 2012). Gareth Jones et al. (2015Jones et al. ( , 2019 provided an update on the fungal species, including yeasts, isolated from this habitat. In total 1112 fungal species belonging to 472 genera were documented, with approximately 140 Saccharomycotina (Ascomycota) species, and 35 Tremellomycetes yeast species (Agaricomycotina), 35 Pucciniomycotina yeast species, and 4 Ustilaginomycotina yeast species (all Basidiomycota). Note that in this study only one species of Malassezia was listed, namely Malassezia furfur (C.P. Robin) Baill.
Malassezia are among the most widespread and ecologically plastic Basidiomycete yeasts on the planet. Malassezia dominate fungi of human skin (Findley et al. 2013) and are present in most warm-blooded animals. Members of the genus Malassezia are single celled microorganisms, and most species are lipid-dependent. Almost all the known species lack fatty acid synthase (Wu et al. 2015) and rely completely on exogenous lipids to grow, rendering them difficult to isolate in axenic culture. They also contain abundant and novel proteases, which have been shown to inhibit biofilm formation in bacteria , making them an important model for understanding microbial interactions. Like other yeasts, Malassezia have likely been shaped by convergent evolution by genome reduction millions of years ago, probably from a much more complex multicellular ancestor (Nagy et al. 2014). However, all known Malassezia species have unusually small nuclear genomes, compared even with other types of yeasts, and a large number of horizontally transferred genes from bacteria (Wu et al. 2015;Ianiri et al. 2020).
Intriguingly these yeasts have been found in a variety of marine environments from the water column to deep-sea sediments to hydrothermal vents (Amend 2014). The accumulating evidence suggests that these microorganisms are among the most abundant and widespread fungal symbionts in the ocean (Amend 2014). Among the 17 described species, two species, namely Malassezia restricta E. Guého, J. Guillot & Midgley, and Malassezia globosa Midgley, E. Guého & J. Guillot are consistently reported from almost every marine invertebrate species examined, including corals, sponges and sea urchins, although their role in these putative symbiotic systems is unclear. Note that these species are also the most important inhabitants of human and animal skin (Gaitanis et al. 2012;Findley et al. 2013).
No species has yet been cultivated from marine environments. However, phylogenetic reconstructions based on 26S/28S ribosomal DNA sequences (Fig. 9, Suppl. Table  SI2) of environmental sequences downloaded from GenBank shows a remarkable diversity of uncharacterized, putatively novel Malassezia taxa or even new Malassezia-like clades occurring in marine habitats ( Fig. 9; e.g. Clade C). The isolation and genome sequencing of these marine Malassezialike fungi is crucial to characterize the current diversity, to understand the phylogenetic relationships between the different species and clades, and to better understand how the evolutionary transitions occurred between marine and terrestrial environments, as well as adaptations to human and animal skin. The striking diversity of Malassezia species from marine environments suggests that the actual known diversity of marine fungi may represents only the tip of the iceberg while most fungi from these environments are yet to be discovered.  Garcia et al. 2020;Pontes et al. 2020;Yurkov et al. 2020). Most yeasts isolated from cold habitats belongs to the Basidiomycota; only a few ascomycetous species were found, mainly in the genera Candida, Debaryomyces, Meyerozyma, and Pichia (Buzzini et al. , 2018Sannino et al. 2017). For many years the previously polyphyletic genera Cryptococcus and Rhodotorula were reported as the dominant genera of yeasts from cold environments and some discussion focused on their superior ability to overcome the existing harsh conditions, thanks to physiological mechanisms conferring cold tolerance (e.g., synthesis  Table 2). Sequences were aligned with CBS reference strains. Clades that correspond with existing species are indicated by color code. The tree topology is generally poorly supported because of the short sin-gle locus used. Despite comparatively poor resolution, there appear to be several putatively novel lineages indicated with a red star. Clade A, containing M. restricta and M. arunalokei, and clade B, containing M. globosa are the most abundant in marine habitats and contain much of the putatively novel biodiversity of polysaccharide capsules, prevalence of unsaturated fatty acids in the membranes) (Buzzini et al. 2012(Buzzini et al. , 2018Sannino et al. 2017). The taxonomic revision of Pucciniomycotina and Tremellomycetes taxa Wang et al. 2015c) split Cryptococcus and Rhodotorula into smaller monophyletic genera and as a consequence the names of yeast species in these peculiar environments changed. At present, most isolated basidiomycetous yeasts belong to the genera Cystobasidium, Dioszegia, Filobasidium, Glaciozyma, Holtermanniella, Leucosporidium, Mrakia, Naganishia, Phenoliferia, Rhodotorula, Solicoccozyma, andVishniacozyma (Buzzini et al. 2017, 2018;Sannino et al. 2017).
The Industrial Yeasts Collection (DBVPG) of the University of Perugia (Italy) (www. dbvpg. unipg. it) preserves over 1,600 psychrophilic and psychrotolerant yeasts isolated during several sampling campaigns in cold environments (i.e., Arctic, Antarctica, and non-polar regions, Italian Alps and Apennines). Out of them, 23% belong to psychrophilic species while the rest show a psychrotolerant aptitude, i.e. with maximum growth temperature above 20 °C. These yeasts belong to 135 species, and 16 of them (8%) have been described just in the last 10 years.
Apparently, cold habitats still retain a large undescribed yeast biodiversity, which are in danger of extinction mainly due to global warming (discussed in Yurkov et al. 2020). Arctic and global alpine land regions and Arctic Ice Sea are defined as global warming hotspots (IPCC 2018). The expected temperature increase of 1-3.7 °C by 2100 (Collins et al. 2013) will result in the loss of these habitats for plants, animals, and microorganisms. Timely yeast biodiversity assessments in such habitats are strongly recommended. Investigations of glaciers in tropical Africa (Mt Kenya and Mt Kilimanjaro), Asia (Himalaya), and South America (Northern Andes) that, to the best of our knowledge, have never been sampled for the presence of psychrophilic microorganisms, including yeasts, are even more important because these habitats are disappearing at an ever increasing speed.
One hundred and forty-five species have been reported from Chinese fermented products, such as Baiju (Chinese liquor), Huangjiu, Jiuqu including Daqu, Xiaoqu, Fuqu and other starters of fermented alcoholic beverages, wine, koumiss and fermented milk, fermented vegetables and tea (Suppl . Table SI3). Although many different yeast species take part in various food and beverage fermentations, S. cerevisiae plays a special role in those fermentations. Aside from S. cerevisiae, Debaryomyces spp., Kazachstania spp., zemplinina Sipiczki, are the most common yeasts found to be associated with these traditional fermentations. Baiju is one of the oldest distilled liquors in the world and currently exceeds a production of 12 million metric tons annually in China (Zheng and Han 2016). The fermentation of Baiju is a uniquely complex process including more than 58 yeast species included (Suppl . Table SI3). Recently, the diversity of yeasts has been investigated in the Maotai grain, used for the production of the famous Chinese liquor, by cultivation and high-throughput sequencing methods (Hao et al. 2019). In total, 59 genera and 129 species of yeasts were detected from the first deposits of grain to the fifth round of    . Table SI3) have been isolated from sourdough starters used for making Mantou (Chinese steamed bread), a Chinese traditional fermented staple food. The most frequently isolated species was S. cerevisiae and W. anomalus (Han 2013). Forty-three species were found in wine fermentations (Suppl.   . Table SI3).

Conclusions and recommendations
After two centuries of research on yeast diversity we barely scratched the tip of the iceberg. Our knowledge on yeast diversity and distribution is fragmentary in many respects. Many countries, biomes and habitats have not been explored in depth or are not studied at all. Even though DNA-metabarcoding studies have the power to document a vast diversity of yeasts in e.g., soils, air and seawater, and also may provide insights in the extent on yet unknown species, it is still needed to obtain living species to describe them Yurkov et al. 2021). This is not only because current nomenclatural practices demand a physical specimen as the type, a practice that may or may not change in the future, but also because living isolates may boost knowledge on biotechnological, biomedical, agricultural, and food-fermentation applications, which in turn may benefit economically under-developed regions of the world that may harbor a rich yeasts diversity. Despite strong progress in molecular biology tools, identification of yeasts in the environment maybe demanding when a single barcode does not discriminate species Boekhout et al. 2021a). Properly preserved living cultures can provide high-quality genome and transcriptome data to find new and better genetic markers and elucidate essential processes in the environment.
With the past and current taxonomic activity we would need from 175 to about a half of a millennium to document at least 50% of the yeast species that may exist on our planet. Thus, an acceleration of this species discovery process is urgently needed, also in the context of climate change and vanishing habitats, such as dwindling glaciers, melting tundra, bleaching coral reefs, and conversion of natural forests. High throughput cultivation studies can still catch-up species inhabiting these threatened environments.
Comparative genomics investigations showed an increase in the number of hybrid yeasts that may be difficult or impossible to identify using traditional methods. The discovery of hybrid yeasts may be boosted by a strongly augmented sequencing capacity that will also contribute to the in-silico analysis of metabolic pathways and networks, which in turn may also contribute to the use of such not yet explored hybrid yeasts in applied areas.
The above-described scenario will likely most benefit from strong collaborations between scientists from different areas, be it geographically, ecologically, technical know-how, fundamental or applied, rather than following a competing strategy between scientists. The statement 'Fortunately those who start now' made by prof. Martinus Beyerinck, the first professor in Microbiology at what is now Technical University of Delft, The Netherlands, in his farewell address one century ago is still very much true.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.