Background

Most commercial alcoholic fermentations are currently performed by yeast from the genus Saccharomyces with the most common species being Saccharomyces cerevisiae. The domestication of S. cerevisiae is thought to have begun as early as prehistoric times [1]. To date, many commercially available strains have been selected for fermentation in harsh conditions, such as those encountered during wine, beer, and industrial bioethanol fermentations [2,3,4]. In parallel with Saccharomyces, a distantly related genus of budding yeasts, Brettanomyces (teleomorph Dekkera), has also convergently evolved to occupy this same fermentative niche [5].

There are currently five accepted species of Brettanomyces: B. anomalus, B. bruxellensis, B. custersianus, B. naardenensis, and B. nanus [6]. A sixth species, Brettanomyces acidodurans, was recently described, although this species has only been tentatively assigned to this genus, due to a high genetic divergence relative to five species and has not been included in this study [7]. Brettanomyces species were originally characterized with a combination of morphological, physiological, and chemotaxonomical traits [8], although the phylogeny has since been defined and updated using several methodologies, often with conflicting results [8,9,10]. Three different phylogenies were originally presented based on analyses of the 18S or 26S ribosomal RNA sequences, which showed conflicting placement of B. custersianus and B. naardenensis [8]. Four additional phylogenies, based on either 18S or 26S RNA, or on the concatenated sequences for SSU, LSU, and elongation factor 1α sequences have also been published [9, 10]. These show a consistent placement for B. custersianus but somewhat inconsistent branching and poor branch support for B. naardenensis and B. nanus.

Brettanomyces spp. are most commonly associated with spoilage in beer, wine, and soft drink due to the production of many off-flavour metabolites including acetic acid, and vinyl- and ethyl-phenols [5, 11, 12]. However, Brettanomyces can also represent an important and favorable component of traditional Belgian Lambic beers [13, 14], and their use has increased in recent years in the craft brewing industry [15]. Furthermore, B. bruxellensis has shown potential in bioethanol production by outcompeting S. cerevisiae and for its ability to utilize novel substrates [16, 17].

B. bruxellensis and to a lesser extent B. anomalus, are the main species encountered during wine and beer fermentation and has led to the majority of Brettanomyces research focusing only on these two species. The initial assembly of the triploid B. bruxellensis strain AWRI1499 [18] has enabled genomics to facilitate research on this organism [19,20,21,22,23]. Subsequent efforts have seen the B. bruxellensis genome resolved to chromosome-level scaffolds [24]. In contrast, the assemblies that are available for B. anomalus [25], B. custersianus, and B. naardenensis, are less contiguous, and are mostly un-annotated, while no genome assembly is currently available for B. nanus.

Brettanomyces genomes have been shown to vary considerable in terms of ploidy and karyotype with haploid, diploid, and triploid strains of B. bruxellensis being observed [22, 26]. In addition to ploidy variation, karyotypes can also vary widely, with chromosomal numbers in B. bruxellensis being estimated to range between 4 and 9 depending on the strain [27]. Currently available assemblies for Brettanomyces vary from 10.2 Mb for B. custersianus, and between 11.8 Mb and 15.4 Mb for B. bruxellensis (based on haploid genome size).

Recent advancements in third-generation long-read sequencing have enabled the rapid production of highly accurate and contiguous genome assemblies, particularly for microorganisms (reviewed in [28]). This study sought to fill knowledge gaps for various Brettanomyces species by sequencing and assembling genomes using current-generation long-read sequencing technologies [29], and then to use these new assemblies to explore the genomic adaptations that have taken place across the Brettanomyces genus.

Results

New genome assemblies for the Brettanomyces genus

Information regarding the species and strains used in this study is listed in Table 1. In the interest of obtaining high-quality and contiguous assemblies, haploid or homozygous strains were favored (the B. anomalus strain was the exception), with strains that featured in past studies prioritized. All strains had been isolated from commercial beverage products, with three from commercial fermentations.

Table 1 Strain details and growth conditions

Haploid assemblies were produced for all the Brettanomyces species (genome assembly summary statistics are shown in Table 2 and MinION sequencing statistics are available in Table S1). Genome sizes for B. bruxellensis and B. anomalus of 13.2 and 13.7 Mb, respectively were well within the range of other publicly-available Brettanomyces assemblies, which range from 11.8 Mb to 15.4 Mb [18, 24, 25, 34, 35]. The B. custersianus assembly size was 10.7 Mb, similar to assemblies of other B. custersianus strains (10.2 Mb to 10.4 Mb) [36]. The B. naardenensis assembly was 11.16 Mb, highly similar to the only other published assembly [37]. The B. nanus assembly was the smallest at only 10.2 Mb and represents in the first whole-genome sequence for this species.

Table 2 Assembly and BUSCO summary statistics for the haploid assemblies

The overall contiguity of the assemblies varies due to differences in heterozygosity and sequencing read lengths. The B. anomalus strain is a heterozygous diploid organism and while read coverage was high, the median read length was relatively low at 4.7 kb. This resulted in the lowest contiguity in the study consisting of 48 contigs for the haploid assembly with an N50 of 640 kb. The B. nanus strain is a haploid organism and had a much higher median read length of 14.9 kb. As such, this assembly had the best contiguity consisting of only 5 contigs with an N50 of 3.3 Mb.

In order to assess the completeness of each assembly, BUSCO statistics were compiled for each genome (Table 2). Predicted genome completeness was high for the haploid assemblies, with between 3.8% (B. naardenensis) and 7.2% (B. anomalus) missing BUSCO genes (BGs). The assemblies were then processed with Purge Haplotigs [38] to remove duplicated and artifactual contigs. Duplication was low for not only the homozygous strains but also for the heterozygous B. anomalus assembly with between 0.5% (B. nanus) and 1.2% (B. anomalus) duplicate BGs.

Given the significant differences in the genome sizes within the Brettanomyces genus, it was of interest to determine if this size range was due to differences in overall gene number, gene compactness or both. The total number of predicted genes, gene densities (the percent of genome that is genic) and the number of orthogroups with multiple entries were calculated for each Brettanomyces genome, in addition to S. cerevisiae as a point of comparison (Table S2). B. nanus (smallest genome) had the fewest genes (5083), the highest gene density (78.1% genic) and the lowest number of expanded orthogroups (5.2%). Conversely, B. anomalus (largest genome) exhibited the highest number of genes (5735), the most ortholog duplicates (10.4%) and the largest proportion of intergenic sequences (62.2% genic).

Given the heterozygous nature of the B. anomalus genome, a diploid assembly was also generated for the strain AWRI953. The resultant diploid assembly was approximately twice the size of the haploid assembly and had a slightly improved N50 of 730 kb. While the genome size doubled, duplicated BGs only increased from 1.2% for the haploid assembly to 35.9% for the diploid assembly. In an ideal scenario, in which both alleles are faithfully separated, duplicated BGs would be closer to 100%. The low number of duplicated BGs was found to mainly be the result of a number of fragmented gene models being present in one of the two haplomes. It should be noted that while the diploid B. anomalus assembly is split into Haplome 1 (H1) and Haplome 2 (H2), these haplomes consist of mosaics of both parental haplotypes. This is an unavoidable artefact of assembly where haplotype switching can randomly occur due to breaks in heterozygosity, and between chromosomes.

Taxonomy of Brettanomyces

This collection of high quality Brettanomyces genomes allowed for a comprehensive phylogeny to be generated, which utilized the entire genome, as opposed to extrapolating relationship based upon ribosomal sequences. Codon-based alignments were produced for 3482 single-copy orthologues (SCOs) that were common across the five Brettanomyces species, in addition to using Ogataea polymorpha (closest available non-Brettanomyces genome) as an outgroup. These concatenated alignments were used to calculate a maximum-likelihood tree (Fig. 1a) and to estimate average nucleotide identity (ANI) between pairs of genomes (Table 3). Individual gene trees were also generated for all SCO groups. These individual gene trees were then used to generate a coalescence-based phylogeny (Figure S1a) to check for consistency with, and to generate branch support values for, the concatenation-based phylogeny. As a point of comparison, this phylogenetic methodology was also performed on the members of the Saccharomyces genus (Fig. 1b, Figure S1b, and Table 4).

Fig. 1
figure 1

Phylogenies of Brettanomyces and Saccharomyces species. Rooted, maximum likelihood trees were calculated for Brettanomyces species with Ogataea polymorpha as an outgroup (a) and Saccharomyces species with Naumovozyma castellii as an outgroup (b). The phylogenies were calculated from concatenated codon alignments of single copy orthologs. IQ-TREE’s ultrafast Bootstrap values are calculated from 1000 replications and are shown at branch nodes in red. Branch support calculated from individual gene trees is shown at branch nodes in blue. The two phylogenies are transformed to the same scale (substitutions per site)

Table 3 Average Nucleotide Identities (percent) between Brettanomyces species and Ogataea polymorpha concatenated single copy ortholog codon alignments
Table 4 Average Nucleotide Identities (percent) between Saccharomyces species and Naumovozyma castellii concatenated single copy ortholog codon alignments

When compared to the distances between the members of the genus Saccharomyces, there is a much larger genetic distance separating the various Brettanomyces species. Indeed, there is a greater genetic distance between most of the Brettanomyces species than there is between any of the individual Saccharomyces species and the outgroup used for that phylogeny (Naumovozyma castellii). The largest separation was observed between B. nanus and B. bruxellensis, which presented an ANI of only 60.6%. The closest relationship between any two Brettanomyces species was between B. bruxellensis and B. anomalus with an ANI of 77.1%, followed by B. nanus and B. naardenensis with an ANI of 66.4%. The remainder of pairwise ANIs ranged between 60.6 and 61.3%. For comparison, pairwise ANIs calculated between each of the Saccharomyces species and the outgroup (N. castellii) ranged between 61.4% (S. kudriavzeviiI) and 61.6% (S. cerevisiae). Furthermore, the genetic distance between the most distantly related Saccharomyces species (S. cerevisiae and S. eubayanus, ANI of 79.9%) is less than the genetic distance between the most closely related Brettanomyces species.

Extensive rearrangements are present throughout Brettanomyces genomes

In order to ascertain if larger-scale differences accompanied the extensive nucleotide diversity that was observed between the Brettanomyces species, whole-genome alignments were used to detect structural rearrangements between the genomes (Fig. 2). There were numerous small and several large translocations present between the B. bruxellensis and the B. anomalus assemblies (Fig. 2a) with a total of 71 syntenic blocks identified. The B. bruxellensis and B. custersianus assemblies showed less overall synteny, with the alignment broken into 93 syntenic blocks (although individual translocation units appear to be smaller; Figure S2). Comparing B. bruxellensis to the more distantly related species B. naardenensis (Fig. 2b) and B. nanus (Fig. 2c), these breaks in synteny are also common, with 91 and 117 syntenic blocks observed, respectively. The chromosomal rearrangements were also not limited to a single species or clade; when comparing B. nanus to B. naardenensis (Fig. 2d) there were 73 syntenic blocks identified, very similar to that occurring between B. bruxellensis and B. anomalus.

Fig. 2
figure 2

Synteny between haploid assemblies of Brettanomyces, visualized as Circos plots. Reference assembly Contigs are coloured sequentially. Alignments are coloured according to the reference assembly contigs and are layered by alignment length. The query assembly contigs are coloured grey. Alignments are depicted between B. bruxellensis and B. anomalus (a), B. bruxellensis and B. naardenensis (b), B. bruxellensis and B. nanus (c), and B. nanus and B. naardenensis (d)

Given the heterozygous nature of the B. anomalus genome analyzed in this study, the genome was examined for the presence of large LOH tracts. Three large contigs, comprising 2.14 Mb (15%) of the B. anomalus genome, were predicted to be homozygous (0.0353 SNPs/kb) while the rest of the genome is heterozygous (3.21 SNPs/kb) (Figure S3). The strains used in this study as reference for B. bruxellensis, B. custersianus, B. naardenensis, and B. nanus appeared homozygous as expected, with heterozygous SNP densities ranging from 0.01 SNPs/kb (B. naardenensis) to 0.05 SNPs/kb (B. bruxellensis).

Enrichment of fermentation-relevant genes

Given the apparent adaptation of Brettanomyces to the fermentative environment, each Brettanomyces genome was investigated for the presence of specific gene family expansions (Table 5). Both B. bruxellensis and B. nanus were predicted to have undergone copy number expansion of ORFs predicted to encode oligo-1,6-glucosidase enzymes (EC 3.2.1.10), which are commonly associated with starch and galactose metabolism (Fig. 3a). B. nanus was also predicted to possess an expanded set of genes encoding β-glucosidase (EC 3.2.1.21; Fig. 3b) and β-galactosidase (EC 3.2.1.23; Fig. 3c) activities, which are involved in the utilization of sugars from complex polysaccharides.

Table 5 Expanded gene families in Brettanomyces
Fig. 3
figure 3

Phylogenies of several enriched orthogroups in Brettanomyces. Broken gene models or pseudo-genes are indicated as half circles. The enriched gene orthogroups are: oligo-1,6-glucosidase (EC 3.2.1.10) (a), β-glucosidase (EC 3.2.1.21) (b), β-galactosidase (EC 3.2.1.23) (c), and sarcosine oxidase (EC 1.5.3.1/1.5.3.7) (d). Phylogenies are scaled by substitutions per site

B. custersianus and B. bruxellensis presented large expansions (10 and 6 copies respectively) of genes encoding sarcosine oxidase / L-pipecolate oxidase (PIPOX) (EC 1.5.3.1/1.5.3.7) and the remaining Brettanomyces species also contained multiple copies of this gene (Fig. 3d). PIPOX exhibits broad substrate specificity, but primarily catalyzes the breakdown of sarcosine to glycine and formaldehyde, in addition to the oxidation of L-pipecolate [39]. It has been shown that PIPOX also acts on numerous other N-methyl amino acids such as N-methyl-L-alanine, N-ethylglycine, and more importantly from a winemaking perspective, both L- and D-proline [39,40,41,42].

In addition to PIPOX, B. bruxellensis and B. anomalus share an expansion of S-formylglutathione hydrolase (EC 3.1.2.12), and B. anomalus contains an expansion of formate dehydrogenase (EC 1.17.1.9). These genes are part of methanol metabolism in other species (a capability lost in Brettanomyces) and are also involved with the metabolism of formaldehyde (a common metabolite during fermentation). Lastly, B. naardenensis contains an expansion of a gene encoding sulfonate dioxygenase (EC 1.14.11.-) activity, associated with the utilization of alternative sulphur sources, and an expansion of acetylornithine deacetylase (EC 3.5.1.16), a component of the arginine biosynthetic pathway.

Horizontal gene transfer enables sucrose utilization in B. bruxellensis and B. anomalus

Potential HGT events that may have contributed to the evolution of Brettanomyces were investigated. Twelve Brettanomyces orthogroups were predicted to be the result of HGT from bacteria (Table 6). Of these bacterially derived gene families, a Glycoside Hydrolase family 32 gene (GH32), which was predicted to have β-fructofuranosidase activity (EC 3.2.1.26), is likely to have had a key phenotypic impact during the evolution of this genus. GH32 enzymes hydrolyse glycosidic bonds and β-fructofuranosidase (Invertase) is specifically responsible for the breakdown of sucrose into fructose and glucose monomers and is required for the utilization of sucrose as a carbon source.

Table 6 Genes predicted to occur in Brettanomyces via Horizontal Gene Transfer

To further confirm the bacterial origins of the Brettanomyces invertases, a protein-based phylogeny was created from the highest scoring eukaryote and prokaryote blast hits from the RefSeq non-redundant database, as well as from these three Brettanomyces invertases (Fig. 4a). The prokaryote and eukaryote invertases each form two distinct clades. Consistent with a bacterial-derived HGT event, the Brettanomyces invertase proteins reside within one of the two prokaryote clades and are evolutionarily distinct from the eukaryote groups. There are also three other eukaryote invertases that reside within a prokaryote clade, and two prokaryote invertases that reside within a eukaryote clade, which suggests that HGT of this important enzyme activity is not unique to Brettanomyces. To confirm the placement of the Brettanomyces invertases in the prokaryotic clade, three alternate topologies (within either of the eukaryote clades, as well as within the second prokaryote clade) were tested (Figure S4, Table S3). These constrained topologies were all significantly less likely compared to the unconstrained tree (Figure S4, Table S3).

Fig. 4
figure 4

β-fructofuranosidases (invertases) from Brettanomyces. Phylogeny of invertases from Brettanomyces and the top blast hits from the RefSeq non-redundant prokaryote and eukaryote databases, scaled by substitutions per site, with Brettanomyces nodes enlarged for clarity (a). Genomic context of invertases in Brettanomyces, showing cluster of conserved genes, orange; NAG gene cluster, green; cluster of metabolic genes, blue; Invertase, red (b)

The genomic context of the invertases present in B. bruxellensis and B. anomalus was also examined. These genes are predicted to reside within sub-telomeric regions (Fig. 4b). In Brettanomyces, there is significant structural variation and a general loss of synteny, which is typical of sub-telomeric regions in other species (Fig. 4b). For example, in B. nanus the NAG gene cluster resides within a different sub-telomere relative to B. bruxellensis and B. anomalus. The NAG genes are also present in B. naardenensis, but are not co-located and appear to be missing entirely in B. custersianus. Likewise, homologues of the MPH3 and TIP1 genes that are present across all the Brettanomyces species, are only found in this specific sub-telomeric region in B. bruxellensis and B. anomalus.

Discussion

New genome assemblies for the five Brettanomyces species are described, which generally exhibit significant improvements over previous assemblies produced for this genus. The most contiguous genome assembly described was that of B. nanus, which comprised only 5 contigs and which had an N50 of 3.3 Mb. To the best of our knowledge, this makes the B. nanus assembly the most contiguous Brettanomyces assembly to date. When comparing the assemblies of the other species to the next most contiguous assembly available from other Brettanomyces sequencing studies, the B. anomalus assembly represents a 4.7-fold improvement over GCA_001754015.1 (261 contigs), B. custersianus a 9.4-fold improvement over GCA_001746385.1 (226 contigs) and B. naardenensis and 6.5-fold improvement over GCA_900660285.1 (104 contigs). While the predicted completeness for these new assemblies were all generally high, there was also considerable differences in gene density and content. This was most prominent between B. nanus and B. anomalus, with the B. nanus genome containing fewer total genes, less intergenic sequence and lower duplication of specific orthogroups.

The high-quality genome sequences allowed for the calculation of a Brettanomyces whole-genome phylogeny. The topology of the whole-genome phylogeny generally agreed with those derived from rRNA sequences in the placement of B. bruxellensis, B. anomalus and B. custersianus [8,9,10]. However, these earlier studies were not able to consistently resolve the placement of B. nanus and B. naardenensis, with conflicting results between phylogenies based on 18S and 26S ribosomal RNA sequences. The whole-genome phylogeny proposes the Brettanomyces genus to be comprised of two clades, with B. nanus and B. naardenensis forming a clade separate from the other species. This whole-genome topology is consistent with previous 18S phylogenies. Comparison of ANI values identified that there is a larger genetic distance separating some Brettanomyces species than there was separating the Saccharomyces and Naumovozyma genera. While ANI values alone are generally insufficient for determining genus boundaries (at least in prokaryotes) [43], the extremely low ANIs that have been observed across the Brettanomyces genus merits further consideration into the taxonomy of this group and whether it may be appropriate for the Brettanomyces genus to be refined.

B. nanus, and to a lesser extent B. bruxellensis exhibited expansions of families of glucosidases and galactosidases that are responsible for the utilization of sugars from complex polysaccharides. These types of expansions are a hallmark of the domestication of beer and wine strains of S. cerevisiae and suggests that a similar process may be occurring in B. nanus [44,45,46]. The three known B. nanus strains were all isolated from beer samples obtained from Swedish breweries in 1952. The B. nanus strain AWRI2847 (CBS 1945) was found to have far less spoilage potential in beer than either B. bruxellensis or B. anomalus [47]. At the time this strain was isolated, microbial spoilage of beer was determined sensorially and sharing yeast samples between both individual fermentations (re-pitching) and breweries was common practice [48]. Taken together, these practices may have allowed B. nanus to remain a long-term undetected contaminant, surviving successive serial re-pitchings and spreading to multiple breweries.

The ability of Brettanomyces to grow in nutrient-depleted conditions has largely been attributed to the utilization of alternative nitrogen sources such as free nitrates and amino acids [49,50,51]. The expansion of PIPOX in Brettanomyces may be partly responsible for this important survival trait. Proline, a substrate of PIPOX, is one of the more common amino acids in fermented wine and beer. Despite this abundance, proline is poorly utilized by S. cerevisiae, however it is readily metabolized by B. bruxellensis [52,53,54,55]. PIPOX converts proline to 1-pyrroline-2-carboxylate, which can be further converted to D-Ornithine by a general aminotransferase. Unlike proline oxidase (EC 1.5.1.2) and proline dehydrogenase (EC 1.5.5.2) which convert proline to 1-pyrroline-5-carboxylase, PIPOX represents an alternative avenue for proline utilization as a nitrogen source that is less impactful to redox homeostasis, which may allow its utilization during fermentation.

Horizontal Gene Transfer (HGT) has been reported as a mechanism of adaptative evolution in fungal species and to have contributed to the domestication of S. cerevisiae [56,57,58]. Similarly, an HGT event is predicted to have conferred the ability to utilize sucrose as a carbon source to B. bruxellensis and B. anomalus via the incorporation of a bacterially-derived invertase. Previous phenotypic testing has shown B. bruxellensis and B. anomalus to be the only Brettanomyces species capable of utilizing sucrose [6] and this phenotype correlates with the presence of this HGT-derived invertase, which is only observed in the B. bruxellensis and B. anomalus genomes (there are no other invertase encoding ORFs predicted in Brettanomyces). The genomic context illustrates further parallels to evolution in Saccharomyces. The invertases are shown to reside within sub-telomeres, which are genomic regions that have been shown to be hotspots for structural rearrangements and HGT events in Saccharomyces [59,60,61,62,63]. Sucrose utilization likely conferred a significant advantage in fruit fermentations, helping to shape the evolution of the common ancestor of B. bruxellensis and B. anomalus towards this fermentation specialization.

Conclusions

High quality genome assemblies for all five currently accepted Brettanomyces species are described, including the first assembly for B. nanus and the most contiguous assemblies available to date for B. anomalus, B. custersianus, and B. naardenensis. Comparative genome analysis established that the species are genetically distant and polyphyletic. Numerous indicators of domestication and adaptation in Brettanomyces were identified with some notable parallels to the evolution of Saccharomyces. Structural differences between the genomes of the Brettanomyces species and apparent loss of heterozygosity in B. anomalus were observed. Enrichments of fermentation-relevant genes were identified in B. anomalus, B. bruxellensis and B. nanus, as well as multiple horizontal gene transfer events in all Brettanomyces genomes, including a gene in the B. anomalus and B. bruxellensis genomes that is probably responsible for these species’ ability to utilize sucrose.

Methods

Detailed workflows, custom scripts for computational analyses and genome annotations are available at https://github.com/mroach-awri/BrettanomycesGenComp (DOI: https://doi.org/10.5281/zenodo.3632185). All sequencing reads and genome assemblies have been deposited at the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) under the BioProject: PRJNA554210. Raw FAST5-format files for all Oxford Nanopore sequencing are available from the European Bioinformatics Institute (EMBL-EBI) European Nucleotide Archive (ENA) under the study: ERP116386.

Strains and media

The five Brettanomyces strains selected for sequencing were supplied by the Australian Wine Research Institute’s wine microorganism culture collection. AWRI953 and AWRI2804 were grown in MYPG medium (0.3% malt extract, 0.3% yeast extract, 0.2% peptone, 1% glucose) at 27 °C and AWRI950, AWRI951, and AWRI2847 were grown in GPYA+CaCO3 medium (4% glucose, 0.5% peptone, 0.5% yeast extract, 1% calcium carbonate) at 25 °C.

Library preparation and sequencing

Genomic DNA was extracted from liquid cultures using a QIAGEN Gentra Puregene Yeast/Bact Kit. B. bruxellensis was sequenced using PacBio RS-II SMRT sequencing. The sequencing library for B. nanus was multiplexed with other samples (not reported here) using the SQK-LSK109 and EXP-NBD103 kits following the Oxford Nanopore protocol NBE_9065_v109_revA 23MAY2018. For the remaining species, libraries were prepared using the SQK-LSK108 kit following the protocol GDE_9002_v108_revT_18OCT2016. Sequencing was performed on a MinION using FLO-MIN106 flow-cells. Demultiplexing and base-calling were performed using Albacore v2.3.1.

Illumina sequencing was performed on each strain using a combination of short-insert (TruSeq PCR-free) and mate-pair (2-5 kb insert and 6–10 kb insert) libraries. All libraries were barcoded and pooled in a single Miseq sequencing run using 2x300bp chemistry.

Assembly

The B. bruxellensis genome in this study was assembled with Mira v4.9.3 [64] (job = genome,denovo,accurate; −NW:cac = warn; PCBIOHQ_SETTINGS; −CO:mrpg = 7) using PacBio long-reads that were error corrected with Illumina paired-end and mate-pair reads using PBcR (wgs-8.3rc1) [65] with default parameters. This assembly was manually finished in DNASTAR SeqMan Pro. Haploid assemblies for all other Brettanomyces species were generated from FASTQ-format Nanopore reads using Canu v1.7 [66]. The Nanopore reads were mapped to the assemblies using minimap2 [67] and initial base-call polishing was performed with Nanopolish v0.9.2 [68], utilizing the FAST5 signal-level sequencing data. Further base-call polishing was performed with Illumina paired-end, and 2–4 kb and 6–10 kb mate-pair reads. Paired-end and mate-pair reads were mapped with BWA-MEM v0.7.12-r1039 [69] and Bowtie2 v2.2.9 [70] respectively; base-call polishing was then performed with Pilon v1.22 [71]. Finally, raw Nanopore reads were mapped to the base-call-polished assemblies and Purge Haplotigs v1.0.1 [38] was used to remove any duplicate or artefactual contigs.

A diploid assembly for AWRI953 (B. anomalus) was also generated. Paired-end reads were mapped to the haploid assembly with BWA-MEM, and high-confidence SNPs were called using VarScan v2.3.9 [72]. Nanopore reads were mapped to the assembly using BWA-MEM. Heterozygous SNPs were phased using the mapped Nanopore reads with HapCut2 commit: c2e6608 [73] and converted to VCF format with WhatsHap v0.16 [74]. New consensus sequences were called for each haplotype from the phased SNPs and the nanopore reads were binned according to which haplotype they mapped best. The two B. anomalus haplotypes were then independently reassembled from the haplotype-binned nanopore reads using the method described for the other species.

All other Brettanomyces assemblies were aligned to the B. bruxellensis assembly using NUCmer (MUMmer) v4.0.0beta2 [75]. Dotplots were visualized and contigs with split alignments were manually inspected for indications of mis-assemblies using mapped alignments of Nanopore reads and Illumina mate-pair reads. Genome metrics were calculated with Quast [76] and completeness, duplication, and fragmentation were estimated using BUSCO v3.0.2 [77] with the odb9 Saccharomyceta dataset.

Annotation

Gene models were predicted with Augustus v3.2.3 [78] using the S. cerevisiae S288C configuration. Gene models were submitted for KEGG annotation using BlastKOALA [79], and GO-terms and functional domains were annotated using InterProScan v5.32–71.0 [80]. Orthogroups were assigned with OrthoFinder v2.2.6 [81] using representative species from Saccharomycetaceae (Table S4) and also using only the haploid Brettanomyces assemblies.

Phylogeny

Orthofinder (Brettanomyces + O. polymorpha) was used to find SCOs over these genomes. Protein sequences were aligned with Muscle v3.8.31 [82] and then converted to codon-spaced alignments using PAL2NAL [83]. Average nucleotide identities were estimated using panito commit: f65ba29 (github.com/sanger-pathogens/panito). A rooted maximum likelihood phylogeny was generated with IQ-TREE [84] on the concatenated codon alignments. IQ-TREE was also used to generate gene trees for all SCOs, and then to generate a coalescence-based phylogeny from the SCO individual gene trees. Phylogenies were created using the same method for the Saccharomyces species + N. castellii (outgroup) to serve as a comparison.

Whole genome synteny visualization

Pairwise synteny blocks were generated between the reference B. bruxellensis assembly and the other haploid assemblies, as well as between the B. naardenensis and B. nanus assemblies. Contigs were placed in chromosome order using Purge Haplotigs [38] to generate placement files that were then used to rearrange contigs. Alignments between the assemblies were calculated using NUCmer with sensitive parameters (−b 500 -c 40 -d 0.5 -g 200 -l 12). Genome windows (20 kb windows, 10 kb steps) were generated for the assemblies and a custom script was used to pair syntenic genome windows based on the NUCmer alignments. Concordant overlapping and adjacent windows were merged, and overlapping discordant windows were trimmed. The synteny blocks were then visualized using Circos v0.69.6 [85].

Gene enrichment

OrthoFinder (Saccharomycetaceae) annotations were used to identify gene-count differences between the Brettanomyces species. The ratio of the gene-count to the average gene-count was calculated for the Brettanomyces species over all OrthoFinder orthogroups. All orthogroups with a ratio ≥ 2 for any Brettanomyces species were subject to GO-enrichment analysis using BiNGO v3.0.3 [86] using the hypergeometric test with Bonferroni Family-Wise Error Rate (FWER) correction. Genes for overrepresented categories (p-value ≤0.05) were returned. Multiple sequence alignments were generated for GO-enriched orthogroups using Muscle and maximum likelihood phylogeny trees generated using PhyML within SeaView v4.7 [87] using default parameters (LG model, BioNJ starting tree, tree searching using NNI substitutions).

Horizontal gene transfer

HGT events were predicted for the Brettanomyces species. Protein sequences for the assemblies were used in BLAST-P searches against the RefSeqKB non-redundant Fungi and Bacteria datasets [88], the Alien Index was calculated as described in [89]. All Brettanomyces proteins with an AI score greater than 20 were investigated further. The multiple sequence alignments and trees were retrieved for the HGT candidates’ orthogroups and several candidates were removed following manual inspection. A phylogeny was generated for one HGT prediction of interest. The Brettanomyces genes, and the top blast hits from the ResSeq non-redundant database eukaryote and prokaryote datasets were aligned with Muscle, and the phylogeny was generated with IQ-TREE. Constrained trees were generated to test the Brettanomyces genes within alternate clades and these were assessed using IQ-TREE’s tree topology tests.