Molecular Phylogenetic Analysis of Candida krusei

Revealing the phylogenetic relationships of Candida krusei strains (sexual form Pichia kudriavzevii) is a prerequisite for understanding the evolution of its virulence-associated mechanisms and ecological lifestyles. Molecular phylogenetic analyses based on entire internal transcribed spacer region (ITS) and multilocus sequence typing (MLST) data were carried out with sequences available in public databases and Hungarian isolates from animals obtained for the study. The ITS haplotype network yielded a high frequency haplotype at the centre of the network (H1; n = 204) indicating that various selective pressure might resulted in population expansion from H1. MLST analysis identified three new genotypes among animal-derived isolates, therefore overall 203 sequence types were investigated to determine the population structure of C. krusei. The most commonly encountered sequence types were ST 17 and ST 67. Phylogenetic analyses showed diverse genetic construction of C. krusei population. Evidence of potential recombination events were also observed that might play some role in high intraspecies genetic variability among strains, however, the limited data of C. krusei genotypes from different countries prevented us to identify accurate evolutionary routes of commensal and pathogenic strains or species-specific lineages. Further expansion of C. krusei MLST database may promote the better understanding of the mixed evolutionary history of this species. Supplementary Information The online version contains supplementary material available at 10.1007/s11046-022-00640-x.

The emergence of antifungal resistance threatens the effective treatment of fungal infections. Azoles are one of the most common antifungal drugs used for long-term antifungal prophylactic treatment of immunocompromised individuals and applied in agricultural practices. C. krusei is a potential multidrugresistant yeast due to its intrinsic resistance to fluconazole (with more than 96% of human clinical and veterinary isolates being fluconazole-resistant) as well as its acquired resistance developed to some other antifungal drugs (amphotericin B, echinocandins) [6,[9][10][11].
Additional specific characteristics of this species is the presence of sexual reproduction. The sexual form (teleomorph) of C. krusei is Pichia kudriavzevii that was proposed by Kurtzman and colleagues in 1980 [12]. P. kudriavzevii isolates are widely distributed in nature and due to its stress tolerance it has an increasing role in biotechnology mainly for production of bioethanol [13]. Moreover, some strains are candidate probiotics [14]. Recent whole genome sequencing supported that there is no genetic distinction between the two species [15]. Phylogenetic analysis of ribosomal DNA (rDNA) and other conserved genes along with genome sequences revealed large evolutionary distance between C. krusei and other medically relevant Candida species that further supported the re-classification of this organism in the Pichia genus [15,16]. In accordance with the guidelines to avoid multiple denomination of any fungus species, the general usage of the name P. kudriavzevii (sexual form) has been suggested but has not yet been enacted; therefore the more conventional name, C. krusei, will be used in our article.
Relatively little genetic or genomic investigation has been carried out on C. krusei isolates so far. Sequence data from multiple loci facilitates determinations of population structures and epidemiological correlates of properties, such as geographical and anatomical origins of isolates and their transmission within and between hosts, therefore multilocus sequence typing (MLST) has gained widespread acceptance as a tool for exploring strain-level differences within microbial species [17][18][19][20]. Another typing scheme based on short tandem repeats for C. krusei have recently been developed that may facilitate the follow up of outbreaks in healthcare settings in the future [21]. A previous MLST study of C. krusei indicated that isolates have diploid, heterozygous genome [18], but phylogenetic analysis using the extended data deposited in MLST database to determine the genetic diversity and population structure of strains has not performed since 2007. The content of the MLST database has expanded since this original report on the genetic structure of C. krusei; therefore the authors felt that new analyses would permit a deeper understanding of the genetics and evolutionary history of this important opportunistic pathogen.

Sample Collection and Identification of Species
Eight C. krusei isolate were cultured from swab samples sent for species identification to our laboratory. All samples were obtained from the uterus of different cows. One isolate from our culture collection was also included in the study, which originated from the oesophagus of a duck diagnosed with gastrointestinal mycosis. Swab samples were inoculated on Sabouraud dextrose agar supplemented with chloramphenicol and incubated at 35°C for 48 h. The cultures were identified by Matrix-assisted laser desorption/ ionization time of flight mass spectrometry (MALDI-TOF MS) and molecular method by sequencing of internal transcribed spacer (ITS) region of fungal rDNA [22].
After culturing isolates for two days, the genomic DNA was extracted and purified directly from a single colony of each isolate by using the Fungi/yeast genomic DNA extraction kit (Favorgen, Ping-Tung, Taiwan). DNA samples were stored at -20°C until analysis. The complete ITS region, including ITS1 and ITS2 and the 5.8S region of rDNA were amplified with fungus-specific universal primers ITS1 (5 0 -TCCGTA GGTGAACCTGCGG-3 0 ) and ITS4 (5 0 -TCCTCCGC TTATTGATATGC-3 0 ). The reaction was performed in a final volume of 15 ll including 1 ll fungal DNA, 2 ll 10 9 DreamTaq buffer, 0.5 ll dNTP (10 mM), 0.5 ll forward and reverse primers (10 lM each), 0.1 ll DreamTaq DNA polymerase (5 U/ll; Thermo Fisher Scientific, Waltham, MA, USA) and 10.4 ll distilled water. The reaction conditions were as follows: initial denaturation step at 95°C for 3 min, 40 cycles of 95°C for 30 s, annealing at 50°C for 30 s, extension at 72°C for 1 min, and a final extension step at 72°C for 10 min. Both strands of purified gene fragments were sequenced using ITS1 and ITS4 primers with BigDye Terminator v3.1. cycle sequencing kit (Thermo Fisher Scientific, Waltham, MA, USA) on an ABI Prism 3130 Genetic Analyzer (Applied Biosystems). Sequences were edited and assembled using MEGA v6 software [23] and compared with sequences in database by the BLAST sequence analysis tool (https://blast.ncbi.nlm.nih.gov/ Blast.cgi).

Multilocus Sequence Typing of C. krusei Isolates
The MLST scheme employed for C. krusei genotyping was based on partial amplification and sequencing of six protein-coding genes (HIS3, LEU2, NMT1, TRP1, ADE2, and LYS2D) according to the international reference MLST scheme established for this species. Six independent PCR amplifications were performed for each isolate. The primer sets and their amplicon lengths were described in detail elsewhere [18]. Experimental conditions used in PCRs and Sanger sequencing were detailed above.
Heterozygosity was identified by the presence of two coincident peaks at the same polymorphic loci on the forward and reverse sequence chromatograms and the consensus sequences of six loci of all isolates were defined [24]. Each nucleotide sequence was compared to those in the MLST database (https://pubmlst.org/ organisms/candida-krusei) [25]. Sequence variation in each locus was assigned an allele number. Those sequences that did not match with any of the preexisting sequences was given a new allele number by the curator. The combination of the six allele numbers defined a unique sequence type (ST) of an isolate.

Phylogenetic and Population Structure Analysis
To assess the intraspecific variability of the ITS region of C. krusei isolates as a universal fungal barcode, we selected sequences deposited in GenBank for phylogenetic analysis. Sequences that were not long enough (\ 470 bp) were eliminated. Eight ITS sequence were excluded from the analysis as they were more closely related to Candida pseudolambica, Pichia bruneiensis and Pichia fermentans than to C. krusei (MN710485-MN710491, MT781361). Neighbor-joining (NJ) tree was constructed from 268 ITS sequences using JC ? G substitution model with bootstrap of 1,000 replications. As NJ tree showed minimal differences between sequences, phylogenetic network was also generalized that better model the phylogenetic relations between isolates by creating haplotypes from diploid sequence data and considering the possibility of recombination, hybridization, gene conversion and gene transfer. Sequence polymorphism analysis was carried out with DnaSP version 6 software and haplotype network file was created [26]. Haplotypes were defined as unique combination of SNPs along a particular sequence. Haplotype diversity values range from 0 to 1, with those closer to 1 indicating higher variability. Haplotype network was generated by median-joining method [27] implemented in NET-WORK v4.6.1.0 (Fluxus Technology, Suffolk, UK).
For multilocus sequence analysis, the sequenced loci were concatenated into a single sequence and single nucleotide polymorphisms (SNPs) were converted as described by  [28] to label homozygous and heterozygous sites in order to allow the cluster analysis of diploid sequence data. Briefly, each base at the polymorphic sites rewritten twice for a homozygous (A, C, G, or T) datum or once each for the two component bases for a heterozygous (K, M, R, S, W, or Y) datum. All sequences available in the MLST database were involved in phylogenetic analysis. The genetic relatedness between the investigated strains was evaluated by unweighted pair group method with arithmetic averages (UPGMA) algorithm with p-distance model in MEGA v6 software. Random bootstrapping of 1,000 operations was used for the construction. Clonal complexes (CCs) of isolates that differed in sequence at only one of the six loci (single-locus variant analysis) were predicted by the goeBURST algorithm (http://www. phyloviz.net/goeburst/) to investigate the evolutionary relationships between isolates. CC is a set of STs that are all believed to be descended from the same founding genotype. STs that could not be assigned to any group were called singletons. Haplotype diversity was calculated and possible recombination events were estimated using the DnaSP v6 software [26,29].

Results
Based on aligned ITS sequences, 76 variable sites and 32 haplotypes with haplotype diversity of 0.3869 were determined. Low intraspecific diversity was found among isolates. Most of the sequences were identical and belonged to 3 haplotypes (H1, H3 and H17). Our samples also clustered to the H1 group along with reference strain ATCC 6258. The ITS haplotype network yielded a high frequency haplotype at the centre of the network (H1; n = 204), followed by H3 (n = 26), H17 (n = 4) and other haplotypes with a The black dots (median vectors) represent unsampled or extinct haplotypes in the population single sequence indicating that various selective pressure might resulted in population expansion from H1. Several haplotypes differed in only one nucleotide from each other which is evident in network analysis but not in NJ phylogenetic reconstruction (Fig. 1).
Concatenated sequence data from the six partial genes yielded a sequence of 2,847 bp for each isolate whose sequences were determined in this study. In two animal-derived isolates a shared new allele was identified in LEU2. Four isolates were assigned to previously described STs whereas another five isolates were classified in new MLST genotypes (ST numbers: 201-203) ( Table 1). Together with our newly submitted data, a total of 203 STs available in the C. krusei MLST database were investigated in the study. Overall, 76 polymorphic sites were found among all examined loci. The NMT1 locus produced the highest number of polymorphic sites (n = 35), while HIS3 displayed the lowest (n = 23). The most commonly encountered strain types were ST 17 (13 isolates) and ST 67 (18 isolates). These isolates were obtained from human blood or an animal source and the geographical origins of isolation were Austria, Canada, Czech Republic, France, Hungary, Italy, Taiwan, United Kingdom, and Venezuela. Other STs were represented by less than 10 isolates. In order to reveal the phylogenetic relationship between isolates, we performed cluster analysis of STs submitted to the MLST database by the UPGMA method. C. krusei STs could be subgrouped into five clusters with an approach described by Jacobsen et al. (2007) [18], however, the bootstrap values for the nodes in the dendrogram were extremely low hampering the accurate distinction of internal structure. The largest group was subtype 1 containing 52.06% of STs, followed by subtype 2 (28.35%), subtype 4 (15.46%), subtype 3 (3.61%) and subtype 5 (0.51%) (Fig. 2). STs from human blood and animal source were the main components of group subtype 1, although, isolates from animals were submitted only from two countries (France and Hungary). Subtype 2 predominantly consisted of STs from the oropharynx (including sputum and bronchoalveolar lavage) of human patients. In contrast to other members of the group subtype 3, the anatomical origin of ST 49 and ST 87 was known (oropharynx/ other superficial source and blood, respectively), while subtype 4 was prevailingly composed of blood isolates. One ST obtained from an animal formed group subtype 5 alone. In general, several STs were identified in all subtypes whose origin was unknown ( Table 2, Fig. 2). Unfortunately, information on STs from different geographical origin is incomplete and thus, the correlation between subtypes and host specificity or anatomical source could not be determined (Fig. 3).
The goeBURST algorithm was used to identify putative ancestral STs and their closest relations. Evaluation of genotypic relationship of strains resulted 18 CCs and 94 singletons. CCs generated by goeBURST were numbered starting from 0 (for the  were detected from both humans and animals and these genotypes were mostly assigned as probable ancestors of CCs (Fig. 4). To provide a more accurate representation of the relationships among genetic groups, haplotype analysis of each six coding regions and the concatenated sequences were carried out and recombination events were estimated, which revealed evidence of potential recombination among the loci assessed suggesting the mixed ancestry of strains (Table 3).

Discussion
Conserved and variable rDNA regions are suitable and universally applied for comparative analysis to identify organisms at taxonomic levels and clarify phylogenetic relationships between species and populations [30]. In recent years, many studies have been denoting some limitations in using ITS for delimiting fungal species such as the lack of differentiation of closely related species and the presence of non-homologous copies of ITS in the genome [22]. Intragenomic ITS variations that was observed in Candida and Pichia species as well may derive from recombination, gene duplication and point mutations produced by DNA replication errors [22,31,32]. However, despite these limitations, the utilization of the rDNA ITS region for sequence analysis still appears to be the most reliable method for the rapid and accurate molecular identification of fungal pathogens. Furthermore, phylogenetic, taxonomic and population dynamics studies are substantially concerned with polymorphisms in the ITS region and are proven to be particularly useful for the delineation of ascomycetous yeasts [30,33]. Comparative analysis of C. krusei ITS sequences including those deposited in public genomic databases showed low intraspecies variability as the majority of sequences belong to H1 haplotype, while the remaining 31 haplotypes are represented by relatively few sequences. The sampling efficiency for network analysis has been supported by the low number of median vectors which predicts the existence of unsampled or extinct haplotypes. Nevertheless, several haplotypes could be evolved from H1 with numerous mutational steps suggesting higher variability of ITS sequence of C. krusei than formerly noted [33]. The C. krusei strains were also studied by MLST analysis. MLST provides relevant information for population genetic studies and for epidemiologic investigations with higher level of reproducibility and minimal subjectivity in analysis compared with other technologies [24]. The genes chosen to be sequenced for MLST conventionally encoded ST that could not be assigned to any group) were marked metabolic functions that are subject to stabilizing selection, because diversifying selection may obscure relationships among isolates [24,34]. C. krusei STs in MLST database (n = 203) lag far behind the number of STs of other Candida species with diploid genome that can be genotyped by the MLST approach (C. albicans, n = 3,669; C. tropicalis, n = 1,245). Jacobsen et al. [18] devised a MLST scheme ideal for strain typing of C. krusei using 129 human isolates and distinguished 94 STs. Fourteen years later, we performed an extended phylogenetic analyses with 302 isolates and 203 STs from the C. krusei MLST database. As a result of this timely update we have gained more comprehensive insight into the evolutionary history of C. krusei using multiple analyses by inclusion of strains from different source. According to our results, similar to the previous observation, STs can be subtyped. Furthermore, the majority of strains originated from animals belonged to Subtype 1. Although certain genotypes were determined from one host (e.g. ST 153-186 originated from animals) or appeared in one continent (e.g. ST 56, ST 60, ST 61 and ST 73 were noted exclusively from Australia), no unequivocal evidence of geographical associations of STs were seen due to the comparatively low numbers of isolates typed so far. This is consistent with findings that this species is the least frequently isolated among the five most common clinically relevant Candida species and typing of strains obtained from animals and environmental sources are somewhat neglected. In addition, submitting genotyped strains to MLST database requires anatomical source of isolation. Thus, strains from other sources (e.g. milk, cheese) [35] marked as unknown prevents a better resolution of phylogenetic relations. Most of the STs that originated from animals (eg., ST 19, ST 67) were putative group founders or differed only by a single allele when using cluster analysis (eg., ST 24,ST 182,ST 184,ST 186) suggesting that these strains were less exposed to changing environmental conditions. The high rate of singletons found in eBURST analysis and the evidence of recombination, which might be an explanation of lower bootstrap values in the UPGMA dendrogram, further supports the unexclusively clonal reproduction of C. krusei, a finding that coincided with previously published data [15,18].
In conclusion, studies on population structure of C. krusei are currently incomplete. Although recombination events may play some role in the observed high intraspecies genetic variability, the limited data of C. krusei genotypes from different countries submitted to MSLT database make it difficult to reveal accurate evolutionary routes of commensal and pathogenic strains or species-specific lineages. Some genotypes were identified from humans and animals that may indicate certain genotypes are better adapted to diverse environmental niches. A significant increase of sequence data together with isolate-associated metadata in the MLST database will be needed for a more robust resolution of the population structure of C. krusei.
Author Contributions MD and KB designed the study. LM provided samples and data. MD performed experiments, data analysis and prepared the first manuscript draft. All authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Data Availability Generated ITS sequences were deposited in GenBank under the following accession numbers: OL470260-OL470268. Allele sequences and concatenated sequences of the six genes used for MLST are available at: https://pubmlst.org/ organisms/candida-krusei. The converted concatenated dataset of loci used for cluster analysis can be found in Supplementary file 1.

Declarations
Conflict of interest The authors report no conflicts of interest.
Ethical Approval The authors confirm that no ethical approval was required.
Human and Animal Rights This work was carried out with diagnostic samples and no animal experimentation was conducted.
Consent to Participate The authors alone are responsible for the content and writing of this paper.

Consent for Publication Not applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.