Background

Yeast, a common taxon found in the soil, plays an important role in maintaining the ecological functioning of the soil, promoting plant growth, and protecting plants from pests and diseases [1]. Yeasts isolated from soil (e.g., Filobasidium magnum, Naganishia albida, and Lipomyces spp.) have been found to produce extracellular polymeric substances to resist extreme external environmental disturbances, forming soil aggregates in the process and enhancing the stability of the soil structure [2,3,4]. Plant roots support the survival of yeast species by secreting carbohydrates and organic acids (i.e., amino acids and carboxylic acids). Yeast, in turn, contributes to plant growth and development by dissolving large amounts of nutrients, such as phosphorus and calcium [5,6,7,8]. Additionally, some soil yeasts are also present as antagonists of pathogens, such as Verticillium dahliae and Pythium aphanidermatum, and thus protect the plant from diseases [8, 9]. The size, diversity, and structure of the soil yeast community are known to be influenced by factors, such as soil type, plant species, and geographic location [1]. Moreover, special ecological environments can help yeast species develop tolerance to conditions, such as high / low temperatures-tolerant, drought-tolerant, salinity, etc. [10]. For example, psychrophilic yeasts can be isolated from glaciers [11]. Therefore, the study of yeast diversity, community structure and adaptation strategies in soils under special environments is essential for the development and utilization of yeast resources.

Xinjiang is located in the hinterland of Eurasia, a transition zone between the dry summer zone of Europe and the humid summer belt of East Asia [12]. The special climatic conditions of this region, such as large differences in temperature between day and night and its long hours of daylight, promote the richness of melon and fruit resources [13]. Rich sugar sources in orchard ecosystems promote yeast survival. Meanwhile, the harsh natural environment of dry summers and cold winters has contributed to the evolution of yeast and thus to the accumulation of yeast diversity [12]. Hami melons are popular worldwide and are considered to be a national geographic product and the king of melons in China due to their pleasant aroma, crisp taste, sweetness, and color [14]. The central production areas of Hami melon are the Turpan-Hami Basin, the northwestern and southwestern Tarim Basin, and the north slope of Tianshan Mountain [15, 16]. Currently, the research on Hami melon yeast is mainly focused on the screening of antagonistic yeast to prevent postharvest diseases and control the bacterial fruit blotch disease [17,18,19,20,21]. However, the diversity and composition of yeasts and the ecological factors that influence the yeast community in the soil of Hami melon orchards in different areas of Xinjiang are unknown; such information will provide an in-depth understanding of the adaptation mechanism of Hami melon soil yeast species and the collection and collation of yeast resources in Xinjiang.

In recent years, research on the yeast species from orchard soils has been done using the culture-dependent method. This method is useful for isolating diverse yeast cultures, enriching the resources bank of yeast strains, screening of useful strains for food, industry, medicine, etc.; however, only a few yeast species have been identified in soil samples using culture-dependent methods, and the possibility for studying microbial population dynamics in an individual environment is limited compared with culture-independent methods [22, 23]. Illumina MiSeq high-throughput sequencing is a technology that is now more widely used, which allows comprehensive and accurate detection of the species composition, generates large data volume with greater coverage compared to traditional culture methods [24]. However, its long run times and short read lengths are not optimal for small-scale sequencing [25, 26]. This study aimed to quantitatively analyze the diversity and structure of rhizosphere soil yeast communities in Hami melon orchards in different regions of Xinjiang (Fig. 1) using the Illumina MiSeq high-throughput sequencer and to explore the environmental factors that influenced the differences in the formation of yeast community structures in different regions. Our results offer new insights into the diversity and structure of yeast communities in the soil of Hami melon orchards in different regions of Xinjiang, providing supplemental information on the yeast resources in Xinjiang orchards.

Fig. 1
figure 1

Sampling locations and geographic distribution of all rhizosphere soil samples of Hami melon in Xinjiang, China. Here, SX, EX, and NX represent the sampled areas of Hami melon in Southern Xinjiang, Eastern Xinjiang, and Northern Xinjiang respectively; KS, AK, TL, HM, TC and TS represents the sampled locations in Kashgar and Aksu Prefecture of Southern Xinjiang, Turpan and Hami Prefecture of Eastern Xinjiang, Changji and Shihezi Prefecture of Northern Xinjiang, respectively. Each sample had three replicates (Not shown in the figure). Digital Elevation Model (DEM) provided by the Chinese Academy of Sciences Geospatial Data Cloud Platform

Results

Sequencing analysis and the richness of yeast communities

After removing chimeras and sequences with low-quality reads, we obtained 1,952,961 fungal sequence reads of the D1 domain of the large subunit (LSU) rRNA gene from 54 soil samples. After removing non-yeast sequence reads, a total of 31, 948 yeast sequence reads were retained and clustered at 97% sequence similarity yielding 86 operational taxonomic units (OTUs). Rarefaction curves of yeast for all sequences plateaued, indicating that sequencing depth per sample was adequate to capture the diversity in the study sites (Fig. 2). In addition, we divided all samples into three large groups according to their geographical locations: Southern Xinjiang group (SX), Eastern Xinjiang group (EX) and Northern Xinjiang group (NX).

Fig. 2
figure 2

Rarefaction curves of rhizosphere soil samples. Rarefaction curves of OTUs were clustered for a dissimilarity threshold of 3%. Each sample had three replicates (Replicates are not specifically shown in the legend, but have been involved in the analysis). Sample abbreviations are same as presented in Fig. 1

The observed species richness (Sobs), estimated richness (Chao1 and ACE indices) and species diversity (Shannon and Simpson indices) showed that the richness of yeast in the Northern Xinjiang was higher than that in the Southern and Eastern Xinjiang, but the diversity was significantly lower than them (P < 0.05) (Table 1). Based on the analysis of intergroup differences, all the values of Sobs, Chao1 and ACE indices of samples from Northern Xinjiang (NX) were the highest among the three groups, but there was no significant difference. The Shannon index was significantly higher in Southern Xinjiang (SX) and Eastern Xinjiang (EX) than in Northern Xinjiang, and Simpson index was significantly higher in Northern Xinjiang (NX) than in Southern Xinjiang (SX) and Eastern Xinjiang (EX).

Table 1 Alpha diversity indices of yeast in rhizosphere soil of Hami melon from different samples

Yeast community composition

The numbers of yeast sequence reads and OTUs detected in samples from the SX, EX, and NX were 4268 and 57, 5616 and 59, and 22,065 and 55, respectively. We found that 34 OTUs were shared by all three groups; OTUs species were shared between SX and EX; 41 OTUs were shared between SX and NX; 38 OTUs were shared between EX and NX (Fig. 3). We identified 86 OTUs, 59 genera, and 86 species, which belonged to Ascomycota and Basidiomycota. Ascomycota contained 45 OTUs, 27 genera, and 45 species accounting for approximately 9.6% of all yeast sequences, while Basidiomycota had 41 OTUs, 32 genera, and 41 species accounting for approximately 90.4%. These include six genera of yeast-like fungi: Aureobasidium (0.54%), Microglossum (0.15%), Basidioascus (0.04%), Hormonema (0.03%) and Cyphellophora (0.01%), Tilletiopsis (0.01%), and a total of 36 rare species (Species with less than 1% frequency of occurrence) were detected (Tables 2 and 3). The dominant genera that accounted for greater than 1% were Filobasidium (54.97%), Vishniacozyma (7.32%), Solicoccozyma (6.41%), Malassezia (5.13%), Sporobolomyces (4.01%), Cutaneotrichosporon (3.16%), Naganishia (2.07%), Udeniomyces (1.92%), Colacogloea (1.82%), Pichia (1.54%), Saitoella (1.41%), and Mrakia (1.20%). The dominant species that accounted for greater than 1% were Filobasidium magnum (54.97%), Vishniacozyma tephrensis (7.32%), Solicoccozyma aeria (6.41%), Malassezia sp. ‘phylotype 131’ (4.89%), Sporobolomyces carnicolor (3.05%), Naganishia albida (2.07%), Udeniomyces sp. 1 AK-2015 (1.69%), Colacogloea philyla (1.66%), Cutaneotrichosporon curvatum (1.60%), Cutaneotrichosporon cutaneum (1.56%), Saitoella complicata (1.41%), Pichia kudriavzevii (1.29%), and Mrakia gelida (1.18%) (Tables 2, 3, Fig. 4a). The 12 dominant genera and 13 dominant species accounted for 90.96 and 89.1% of all yeast sequences, respectively. Filobasidium magnum, the most dominant species of all yeasts, was detected in the NX accounted for 75.36%, SX and EX having 17.90 and 3.03%, respectively; Vishniacozyma tephrensis was the second dominant species: SX (14.50%), EX (0.05%), and NX (7.78%). The proportion of Solicoccozyma aeria was 35.83, 19.25, and 0.67% in SX, EX, and NX, respectively. The most dominant species in the samples from both South (SX) and North Xinjiang (NX) was Filobasidium magnum, Solicoccozyma aeria was the most dominant species in the Eastern Xinjiang (EX) (Fig. 4b).

Fig. 3
figure 3

Venn diagram at the OTU level of soil samples in Southern Xinjiang (SX), Eastern Xinjiang (EX) and Northern Xinjiang (NX). Each circle with different colors in the diagram represents a group; middle core numbers represent the number of OTUs common to all groups. The shared and unique yeast OTUs were shown at a 0.03 dissimilarity distance after removing singletons

Table 2 The Ascomycetous yeasts taxa (accounted for 9.6%) in the Illumina sequencing library
Table 3 The Basidiomycetes yeasts taxa (accounted for 90.4%) in the Illumina sequencing library
Fig. 4
figure 4

Proportion of dominant yeasts in soil samples. (a) all samples; (b) the samples in Southern Xinjiang (SX), Eastern Xinjiang (EX), and Northern Xinjiang (NX). Others indicated that species accounted for less than 1%. Sample abbreviations are same as presented in Fig. 1

The results of the analysis of species differences between groups based on the phylum level showed that the proportion of Basidiomycota was significantly higher than that of Ascomycota among all soil samples from three groups, Basidiomycota was considered to be the dominant phylum. The proportion of Ascomycota in EX was significantly higher than that in SX and NX (P < 0.05) (Fig. 5b). At the genus level, there were 11 dominant genera with significant differences in relative abundance (P < 0.05) among SX, EX, and NX, except for Udeniomyces. Filobasidium and Vishniacozyma were mainly present in the samples from Southern and Northern Xinjiang; Sporobolomyces, Cutaneotrichosporon and Saitoella were detected mainly in samples from the Southern and Eastern Xinjiang; Solicoccozyma and Mrakia are found mainly in the Eastern and Southern Xinjiang, respectively (Fig. 5a).

Fig. 5
figure 5

Species difference analysis of the samples in Southern Xinjiang (SX), Eastern Xinjiang (EX) and Northern Xinjiang (NX): (a) at the genus level; (b) at the phylum level. The y-axis represents the classification levels of species, and the x-axis represents the percentage of species average relative abundance in each sample group. The red, blue, and green columns represent the average results in the SX, EX, and NX soil samples, respectively. The Kruskal-Wallis rank-sum test was used to show significant differences (*: 0.01 < P < = 0.05, **: 0.001 < P < = 0.01, ***: P < = 0.001)

Relationship between yeast communities in samples from different regions

We performed ordination by PCoA at the OTU level to reveal similarities or differences in community composition among grouped samples (Fig. 6). The first principal coordinates axis (PCoA1) and the second principal coordinates axis (PCoA2) alone explained 23.86 and 10.63% of the variance, respectively. PCoA1 has relatively small eigenvalues, capturing less than 50% of the variation in the input data, and therefore is not considered a very successful PCoA. However, R value (0.6144) greater than 0 indicates that the difference between sample groups is greater than the differences within groups and that the difference is significant (P < 0.05). Overall, most samples from each group were clustered together, with only a slight overlapping among the samples from the three groups on the score plots, indicating significant differences in community composition between groups. For inter-groups, the SX and EX were more similar in community composition, and this result can also be observed visually in the box plot of PCoA (Fig. S1).

Fig. 6
figure 6

Principal Coordinates analysis (PCoA) based on Bray-Curtis distance method at the OTU level. Red circles, blue triangles and green diamonds represent samples from SX, EX, and NX, respectively. Each sample had three replicates

Relationship between yeast community structure and environmental factors

The statistical results of soil physicochemical properties showed that the values of conductivity (CO) in the Southern Xinjiang (SX), the levels of organic matter (OM) and total phosphorus (TP) in the Eastern Xinjiang (EX), the pH, total potassium (TK) and available potassium (AK) values in the Northern Xinjiang (NX) are significantly higher than in the other two regions (P < 0.05). Available nitrogen (AN) content of NX was significantly lower than SX and EX (P < 0.05) (Table 4). Redundancy analysis based on yeast genera and soil physicochemical properties in soil samples from different regions showed that the first and second RDA components explained 43.6% of the total variation (Fig. 7a). CO, TP and TK were significantly associated with the yeast community (P < 0.05), and mainly influenced the distribution of samples in the Southern (SX), Eastern (EX) and Northern Xinjiang (NX), respectively. These results suggest a correlation between the yeast community and soil physicochemical properties, particularly total phosphorus (TP) content in the soil. The F-ratio and P values for each soil factor are shown in Table S1.

Table 4 The physicochemical properties of rhizosphere soil of Hami Melon in different regions
Fig. 7
figure 7

Redundancy analysis (RDA) of (a) the correlation between the yeast community and soil physicochemical properties in all samples from three regions of Xinjiang, and (b) the correlation between the yeast community and climate factors at the genus level. Red, blue, and green symbols represent samples from SX, EX, and NX, respectively. Red and black arrows represent the soil parameters and genera, respectively. Soil physicochemical properties: pH, Conductivity (CO), Organic matter (OM), Total nitrogen (TN), Total phosphorus (TP), Total potassium (TK), Available nitrogen (AN), Available phosphorus (AP), Available potassium (AK). Climate factors: Average annual precipitation (PRECTP), Average annual temperature (TEMP), Average annual land surface temperature (LST), Average annual relative humidity (RH), The annual average net solar radiation intensity received by the earth’s surface (SWGNT)

The results of the analysis of climatic factors at the sampling sites in different regions show that the average annual precipitation (PRCP) and relative humidity (RH) were significantly higher in Northern Xinjiang (NX) than in Southern (SX) and Eastern Xinjiang (EX), and the lowest in Eastern Xinjiang (EX) (P < 0.05). The average annual temperature (TEMP), land surface temperature (LST) and net solar radiation intensity (SWGNT) in Southern (SX) and Eastern Xinjiang (NX) are higher than in Northern Xinjiang, but there is no significant difference (Table 5). Redundancy analysis (RDA) of the correlation between the yeast community and climate factors showed that the first and second RDA components explained 39.4% of the total variation. PRCP, RH and SWGNT were the climatic factors that have significant effects on the distribution of yeast communities (P < 0.05) (Fig. 7b). PRCP and RH were negatively correlated with SWGNT, and were positively correlated with Filobasidium and Vishniacozyma but negatively correlated with Solicoccozyma. SWGNT was positively correlated with Sporobolomyces, Cutaneotrichosporon and Saitoella. The F-ratio and P values for each climate factor are shown in Table S2.

Table 5 The climate factors for 2019 at different sampling locations

Discussion

Yeast diversity in rhizosphere soils of Hami melon orchards

A total of two phyla, 59 genera and 86 species of yeasts were detected based on high-throughput sequencing technology in this study (Tables 2 and 3). Using a combination of MALDI-TOF MS and rDNA sequencing, previous scholars identified a total of 60 yeast species from 200 soil samples of five fruit trees (apple, pear, plum, peach and apricot) from two locations in southwest Slovakia [27]. Moreover, only 16 species of yeast were detected in 493 samples of Cameroon-based agricultural soil from nine locations using the culture-dependent method [28]. This indicates that there are rich yeast resources with a relatively high yeast diversity in the rhizosphere soil of Xinjiang Hami melon. On the one hand, the high level of yeast diversity may be related to the high sugar content of Xinjiang Hami melon (15–18%) compared to other fruits and vegetables such as watermelon (7–11%), tomatoes (7–10%) and apples (10–14.2%) [29,30,31,32]. On the other hand, tillage practices also influence the diversity and abundance of soil microorganisms, for example, crop rotation is more conducive to the accumulation of mycorrhizal species than continuous cropping [33,34,35]. During our sampling, we learned that crop rotation is commonly used in Xinjiang Hami melon fields to avoid pests and soil micronutrient deficiencies [36]. Furthermore, epiphytic yeasts from the surfaces of various plant species entering the soil with humus during crop rotation may also further increase soil yeast diversity in Hami melon orchards [37]. This is because, the Basidiomycete genera Vishniacozyma, Sporobolomyces, Kockovaella, Rhodotorula and Cystobasidium in this study were usually isolated from plant surfaces in most studies [33, 37].

Ascomycota was the more diverse phylum, but its abundance was much lower than that of Basidiomycetes (Tables 2, 3 and Fig. 5b), which challenged the traditional view that Ascomycetous yeasts were generally more frequent and abundant in agricultural soils, orchards, and grasslands [38, 39]. Other studies revealed that Basidiomycetes were dominant in forest soils [24, 38, 40]. This may be the result of differences in research methods. Although in this study Basidiomycetes were found to have a greater advantage by high-throughput sequencing analysis, the opposite result may be obtained by culture-dependent method: since Ascomycetous have the advantage of faster growth than Basidiomycetes yeasts during culture [33]. In fact, the conclusion obtained in this study is not an isolated case, as there are previous studies on yeast in citrus orchards soil in which Basidiomycetes yeast is also dominant [41]. In addition, another study has shown that the rhizosphere of maize seedlings (20 d) was harbored only by yeasts of the phylum “Ascomycota”, whereas the rhizosphere of senescent plants (90 d) was inhabited by basidiomycetous yeasts [42]. The samples collected in this study were from rhizosphere soil at the ripening stage of Hami melon, which may also account for the higher abundance of basidiomycetous yeasts.

The rare yeast found in this study accounted for approximately 41.86% of the yeast species in all soil samples of Hami melon (Tables 2, 3), a value within the range of the proportion of rare yeast isolated by other studies from fruit trees, forests, grasslands, and shrub soils [27, 43, 44]. Cutaneotrichosporon cutaneum and Cutaneotrichosporon curvatum were also found in most samples, suggesting that the genus Cutaneotrichosporon may be resident yeast in the rhizosphere soil of Hami melon orchards. A strain of Cutaneotrichosporon cutaneum was found to be highly tolerant to tetracycline antibiotics, chloramphenicol, copper and zinc ions, and to degrade oxytetracycline with high efficiency, which could play a positive role in the prevention of environmental antibiotic contamination [45]. Cutaneotrichosporon curvatus belongs to the oleaginous yeast, which can be used as a biofuel [3, 9, 46]. So, the rhizosphere soil of Hami melon orchards is a potential bioprospecting soil for oleaginous yeasts for biodiesel production. Filobasidium magnum, Naganishia albida and Mrakia gelida belong to three of the dominant species in this study. The first two are capable of producing extracellular polymerases that contribute to the stabilisation of the soil structure and the last can be used in the food industry for brewing low-alcohol beer [47, 48]. In addition, we also detected the pathogenic fungus Malassezia restricta, which causes aggravate atopic dermatitis (AD), and Tilletiopsis washingtonensis, which produces hydrolases and antifungal compounds that can be used as antagonists of powdery mildew fungi in agricultural production, but their abundance was low [49, 50].

Drivers affecting differences in yeast diversity and community structure in different regions

Alpha diversity analysis revealed differences in species richness and diversity among the three regions (Table 1), and the structure of the rhizosphere soil yeast communities of Hami melon also showed geographical differences among the three regions, with the Eastern and Southern Xinjiang being more similar (Fig. 6). We hypothesize that differences in soil physicochemical properties and environmental climate may be the main drivers of the differences in yeast community composition among the three regions of Xinjiang. The Changji and Shihezi areas are in the temperate grey-brown desert soil grey desert soil zone, while Tarim Basin and the Turpan-Hami Basin are in the Warm temperate brown desert soil zone [51]. Soil physicochemical properties analysis also revealed significant differences in soil type between the three areas (Table 4), and RDA analysis showed that electrical conductivity (CO), total phosphorus (TP) and total potassium (TK) were significantly correlated with the yeast community (Fig. 7a). These three factors were positively correlated with the dominant yeast genus in the Southern, Eastern, and Northern Xinjiang, respectively. The strongest correlation between total phosphorus (TP) and the yeast community may be due to the fact that phosphorus is a key element in the nutrient conversion between plants and yeast [52,53,54].

The meteorological data show that radiation intensity and precipitation considerably differed among three regions in Xinjiang (Table 5). And the results of redundancy analysis showed that the average annual precipitation (PRCP), relative humidity (RH) and net solar radiation intensity (SWGNT) were significantly correlated with yeast communities (Fig. 7b). There are differences in radiation levels in the Southern (293–322 KJ/cm2 per year), Eastern (304–307 KJ/cm2 per year) and Northern Xinjiang (262–277 KJ/cm2 per year), with the former two being hotter and more evaporative than the latter [55]. However, the precipitation situation was the opposite. Due to the influence of the warm and humid air currents from Siberia, the climate in Northern Xinjiang is relatively humid, with a little more rainfall; while Southern and Eastern Xinjiang is surrounded by mountains and is characterised by an arid climate with little rainfall; the more complex topography of Eastern Xinjiang creates a variety of habitat types [56, 57]. Additionally, previous study has shown that the abundance of yeast in soil is positively correlated with soil water content [39]. The high proportion of yeast sequence reads in this region and the fact that Filobasidium magnum, which is often isolated from wetter habitats and is the dominant species, was also isolated in the Northern Xinjiang and exists as a dominant species confirm the relatively wetter climate of the Northern Xinjiang [58, 59]. The higher precipitation and relative humidity of soils in the Northern border may have led to a slower decomposition of organic matter in the soil, and Filobasidium magnum is able to degrade or transform various organic compounds [4]. On the contrary, the most dominant species in the genus Solicoccozyma, Solicoccozyma aeria, has a preference for arid environments, mainly in the Eastern Xinjiang [52]. Combining the above information, Filobasidium magnum and Solicoccozyma aeria have the potential to serve as indicator species of ambient humidity. In addition, the Southern and Eastern Xinjiang have a high diversity of yeasts, probably due to the high level of environmental heterogeneity in Southern and Eastern Xinjiang facilitates the generation of genetic mutations and the accumulation of genetic variation in yeast [60]. Furthermore, the high quality and strong landrace of Hami melon in Southern and Eastern Xinjiang also reflect the good interplay between rhizosphere yeast community and plants [15].

Conclusions

Our results showed that yeast resources were abundant in the soil of Hami melon orchards, and there were noticeable differences in yeast diversity and community structure among Southern, Eastern, and Northern Xinjiang. The results of this study provided interesting insights into the relationship between the yeast composition of rhizosphere soil in Hami melon orchards and their geographic regions. The results also demonstrated that both conductivity (CO), total phosphorus (TP) and Total potassium (TK) in soil factors and the average annual precipitation (PRCP), relative humidity (RH) and net solar radiation intensity (SWGNT) in climate factors have an influence on yeast community structure. The results of this study will provide a theoretical basis for better exploitation of soil yeast resources and understanding of their adaptive mechanisms.

Methods

Study sites and sampling

We collected rhizosphere soil samples from Hami melon orchards from six different areas within three big regions of Xinjiang between July and August 2019. Study sites included the Kashgar (35°20′ - 40°18′ N and 73°20′ - 79°57 E) and Aksu (39°30′ - 42°41′ N and 78°03′ - 84°07 E) Prefecture (SX, Southern Xinjiang), the Turpan (41°12′ - 43°40′ N and 87°16′ - 91°55 E) and Hami (40°52′ - 45°05′ N and 91°06′ - 96°23 E) Prefecture (EX, Eastern Xinjiang), the Changji (43°20′ - 45°00′ N and 85°17′ - 91°32 E) and Shihezi (43°20′ - 45°20′ N and 84°45′ - 86°40 E) Prefecture (NX, Northern Xinjiang). Then three locations have Hami melon orchards with a planting area of not less than 3 ha were selected from each prefecture for sampling, and soil samples were collected in triplicates from each orchard (Fig. 1). In total, 54 rhizosphere soil samples were studied. The five-point sampling method was used for sample collection. Briefly, five Hami melons at maturity were randomly selected from each orchard to collect soil samples around their roots, at approximately 10 cm depth, using a shovel and sieved to remove plant residues and stones. The rhizosphere soil samples of five Hami melon plants were then mixed evenly and divided into three equal portions. Each sample was stored individually in sterile self-sealing bags and transported to the laboratory in an ice box (< 10 °C). After each soil sample was crushed and filtered using a 2 mm sieve, they were divided into two parts: one part was air dried and used for soil physicochemical analysis; the other part was stored in a − 80 °C refrigerator for DNA extraction.

The soil types in Kashgar, Aksu, Turpan and Hami Prefecture are clay loam, brown-gray clay loam, sandy loam and sandy clay loam, respectively. The soil types in Changji and Shihezi Prefecture are both loamy clay. Xinjiang has a variety of climate types, with a clear distinction between warm, cold and temperate from south to north, and dry and wet from east to west. Therefore, we divided all samples into Southern (SX), Eastern (EX) and Northern Xinjiang (NX) groups according to their geographical distribution for subsequent analysis. The climate information for each sampling site is shown in Table 5. The data of precipitation (PRCP) and temperature (TEMP) from NOAA - Climate Prediction Center (https://www.cpc.ncep.noaa.gov/), land surface temperature (LST) and relative humidity (RH) from NASA GES DISC MERRA2 - inst1_2d_asm_Nx (https://disc.gsfc.nasa.gov/), net solar radiation intensity (SWGNT) from NASA GES DISC MERRA2 - tavg1_2d_rad_Nx (https://disc.gsfc.nasa.gov/).

DNA extraction and Illumina MiSeq

E.Z.N.A.® soil DNA Kit (Omega Biotek, USA) was used to extract total DNA from soil samples (0.5 g) following the manufacturer’s protocol. The final DNA concentration was detected using a NanoDrop 2000 UV-Vis spectrophotometer (Thermo Scientific, USA). The integrity of the DNA was assessed using 1% agarose gel electrophoresis. The yeast 26S rDNA was amplified with a pair of specific primers with barcode NL1F (forward primer) (5′-GCATATCAATAAGCGGAGGAAAAG-3′) and NL2R (reverse primer) (5′-CTTGTTCGCTATCGGTCTC-3′) [61]. The PCR reaction system (20 μL) contained 5× FastPfu Buffer (4 μL), 2.5 mM dNTPs (2 μL), primer (5 μM; 0.8 μL each), FastPfu Polymerase (0.4 μL), BSA (0.2 μL), and template DNA (10 ng). The PCR reaction was performed using a thermocycler PCR system as follows: 5 min at 98 °C (denaturation), 30 cycles at 98 °C for 30 s, 52 °C for 30 s, and 72 °C for 45 s, and finally, at 72 °C for 5 min (elongation). The PCR products were analyzed using 2% agarose gel electrophoresis, purified using the AxyPrep DNA Gel Extraction Kit (Axygen Biosciences, USA). The DNA fragments were quantified using QuantiFluor™-ST (Promega, USA) [62]. Equimolar amounts of purified DNA fragments were pooled after individual samples were tagged with indexes through an index PCR, and the Illumina MiSeq PE300 platform (Illumina, USA) was used to perform paired-end sequencing (2 × 300) following the protocol by Meiji Biomedical Technology Co. Ltd. (Shanghai, China).

Sequence processing

Raw sequence files were demultiplexed and quality filtered by Trimmomatic and merged by FLASH based on the following criteria: (i) reads with an average quality score < 20 over a 50-bp sliding window were truncated; (ii) sequences with an overlap longer than 10 bp were merged based on their overlapping sequences; (iii) the maximum mismatch ratio allowed in the overlap region of a spliced sequence was 0.2, and non-conforming sequences were eliminated; (iv) the samples were differentiated according to the barcode and primers at the beginning and end of the sequence; the sequence orientation was adjusted, the number of mismatches allowed by the barcode was 0, and the maximum number of primer mismatches was 2 [63,64,65]. OTUs were clustered with a 97% similarity cutoff using UPARSE (version 7.1 http://drive5.com/uparse/), and chimeric sequences were identified and removed using the UCHIME software [66, 67]. The classification of each D1 domain of the LSU rRNA sequence was analyzed by the Ribosomal Database Project (RDP) Classifier algorithm (version 2.2 http://sourceforge.net/pro-jects/rdp-classifier/) [66]. The NCBI database (National Centre for Biotechnology Information, https://www.ncbi.nlm.nih.gov/public/) database using a confidence threshold of 0.7 [68]. The observed richness (Sobs), the ACE index the Chao1 estimator, the Shannon diversity (H) index and the Simpson index were calculated using the mothur (version v.1.30.2 https://mothur.org/wiki/chao/, https://mothur.org/wiki/ace/, https://mothur.org/wiki/shannon/, http://mothur.org/wiki/Simpson) index analysis with Operational Taxonomic Units (OTUs) at 0.97 level [69]. Next, we plotted the rarefaction curves to observe the community abundance of each sample and the sequencing data [62, 66].

Determination of soil chemical properties

Here, we evaluated nine soil physicochemical factors (Table 4). The soil water suspension was shaken for 30 min, followed by measurement of pH using a glass electrode meter. A naturally dried soil sample was mixed with water at a ratio of 1:5 (M/V), and conductivity (CO) was determined using the electrode method. The organic matter (OM) was determined by titration with ferrous sulfate, using o-phenanthroline as the indicator, by adding a potassium dichromate-sulfuric acid solution to a test tube containing the soil samples. The available nitrogen (AN) and total nitrogen (TN) were determined by the Kjeldahl method. The available phosphorus (AP) in the soil was extracted with sodium bicarbonate and then determined using the molybdenum blue method. The available potassium (AK) in the soil was extracted with ammonium acetate and determined by flame photometry. Total phosphorus (TP) and total potassium (TK) were measured by acid solubilization [70, 71].

Data analysis

SPSS Statistics v25.0 software (IBM, USA) was used to analyze the data of soil physicochemical properties and climatic factors. All values are presented as mean ± standard error (mean ± SE). Since the data were not normally distributed, Kruskal-Wallis test for independent samples was used to compare the physicochemical properties of the soil and climatic factors among different groups. Differences were taken statistically significant at P < 0.05. The dilution curve was drawn using the “vegan” and “ggplot2” packages in R (v4.0.2); Venn diagram using the “VennDiagram” package; community bar graph was plotted using “ggplot2” and “ggalluvial” packages in R (v4.0.2). Since the data of alpha diversity indices did not follow a normal distribution, the Kruskal-Wallis test was used to detect whether there were significant differences in alpha diversity indices among the groups. Analysis of the species that showed differences between groups based on genus level and phylum level was performed by Kruskal-Wallis rank-sum test, followed by plotting through the “ggplot2” package in R (v4.0.2). In this process the P-values are corrected for multiple testing by the false discovery rate (FDR) and further testing by Post-hoc testing after the Kruskal-Wallis H-test, with a further two-way comparison of the multiple groups, which is done by the stats package for R and the scipy package for Python. Principal co-ordinate analysis (PCoA) was done based on Bray-Curtis at OTU level to analyze similarities or differences in the community composition of samples using “vegan” and “ape” packages in R (v4.0.2). Tests for differences between groups in PCoA were analyzed using ANOSIM (analysis of similarities) by vegan package in R. Redundancy analysis (RDA) was used to evaluate the relationships between soil factors and yeast communities and between climatic factors and yeast communities respectively, based on sample soil physicochemical properties, local meteorological data and sample genus level data and calculated using the software Canoco for Windows 5 (Microcomputer Power, USA) [62, 65, 66]. Monte Carlo permutation test in Canoco was used to identify environmental factors that were significantly associated with yeast community structure.