IMA Genome-F 9A

Draft genome sequence of Annulo-hypoxylon stygium

Introduction

Annulohypoxylon stygium (Xylariales, Ascomycota) is a white-rot fungus commonly found on dead wood (Hsieh et al. 2005). Annulohypoxylon stygium displays an extremely high performance in lignin and carbohydrate degradation. Some species of Annulohypoxylon may be used in the cultivation of Tremella fuciformis, one of the foremost medicinal and culinary fungi of China (Stamets 2000). Tremella fuciformis, the white jelly mushroom, is a symbiotic fungus that does not form an edible basidiome without the presence of a specific host fungus (Li et al. 2014). Its preferred host has traditionally been indicated as “Xianghui” in China. Recently, A. stygium was identified to be the main Xianghui species and this has been confirmed experimentally (Deng et al. 2016). Cultivators usually pair cultures of T. fuciformis with this species for industrial production and the formation of T. fuciformis basidiomes is highly dependent on the presence of the specific host fungus, both in nature and for industrial production.

To date, the symbiotic mechanism of A. stygium and T. fuciformis has not been understood yet. The genome sequence of A. stygium from this study may provide some useful information to reveal the symbiotic mechanism of A. stygium with T. fuciformis.

Sequenced Strain

China: Sichuan: Tongjiang, N 31°42′, E 120°17′, alt. 1523 m, solated from dead wood, 8 Aug. 2015, Qiang Li & Chuan Xiong (MG137 — dried culture).

Nucleotide Sequence Accession Number

The Whole Genome Shotgun project isolate (culture collection number SAAS137) has been deposited at DBJ/EMBL/GenBank under accession number PYLT00000000. The version described in this paper is version PYLT01000000.

Materials and Methods

Annulohypoxylon stygium MG137 was isolated from dead wood in Tongjiang, Sichuan province, China, and was preserved in the Fungal Culture Collection, Center in Biotechnology and Nuclear Technology Research Institute, Chengdu, Sichuan, China. Genomic DNA was isolated from this isolate and subjected to sequencing on the Genome Analyzer IIx next-generation sequencing platform (Illumina) at the Beijing Genomics Institute (Shenzhen, China). Paired-end libraries with respective insert sizes of 425 bp and 725 bp were used to generate read lengths of 150 bases. The CLC Genomics Workbench v. 6.0.1 (CLCBio, Aarhus, Denmark) was subsequently used to trim reads of poor quality (limit of 0.05) as well as terminal nucleotides. The remaining reads were assembled using the SPAdes 3.0.0 with an optimized k-mer value of 103 (Bankevich et al. 2012). Thereafter, scaffolding was completed using SSPACE v. 2.0 (Boetzer et al. 2011) and gaps reduced with the use of GapFiller v. 2.2.1 (Boetzer & Pirovano 2012). The completeness of the assembly was evaluated using the BUSCO v3 (Simão et al. 2015). Homology-based gene prediction and ab initio prediction were performed to search A. stygium gene models. Homologous protein from Laccaria bicolor was used for alignment to the repeat-masked A. stygium genome using Exonerate v 2.2.0 (Slater & Birney, 2005). The filtered alignment results (above 300 bp and 90 % coverage) were built as training models for ab initio gene prediction. The ab initio prediction was conducted using Augustus v. 3.2.3 (Stanke et al. 2008) and GeneMark-ES (Ter-Hovhannisyan et al. 2008) guided by training models from homology-based alignments. All gene prediction results were intergrated into the final gene models by EVidenceModeler (Haas et al. 2008). Carbohydrate-active enzymes (CAZyme), including the repertoire of auxiliary enzymes, were predicted using dbCAN (Yin et al. 2012).

Results and Discussion

The genome of A. stygium had an estimated size of 47.5 Mb with an average coverage of 31.26× (Table 1). The N50 size was 598 310 bases, and the assembly had a mean GC content of 46 %. The total number of scaffold generated was 1854. MAKER predicted a total of 12 498 genes with an average length of 1662 bp. The average gene density of A. stygium was 263 genes/Mb. A phylogenetic analysis of the genus Annulohypoxylon and the closely related genus Hypoxylon is presented to reflect the position of this genome (Fig. 1.).

Table 1 Whole genome DNA sequence assemblies generated in Annulohypoxylon stygium MG137. The genomes of A. stygium MG137 were generated using next generation sequencing technology.
Fig. 1
figure 1

Maximum Likelihood (ML) phylogenetic analysis of the genus Annulohypoxylon and the closely related genus Hypoxylon using MEGA 6.06 based on partial gene sequences of β-tubulin. Bootstrap values were calculated using 1000 replicates to assess node support. Annulohypoxylon stygium isolates used for verification was extracted from the assembled genomes. Reference sequences are obtained from the NCBI database with accession number.

The draft genome of A. stygium is larger than that of the allied species Xylaria hypoxylon OSC100004 and Hypoxylon sp. CI-4A (Wu et al. 2017), which are 42.9 Mb and 37.7 Mb, respectively. The genome is closer in size to that of Hypoxylon sp. CO27-5 and Hypoxylon sp. EC38 (Wu et al. 2017), which have genome sizes of 46.6 Mb and 47.7 Mb, respectively. Annulohypoxylon stygium also has a similar number of putative genes when compared to Hypoxylon sp. EC38 (12261 predicted gene models) and Hypoxylon sp. CO27-5 (12 256 predicted gene models).

A total 757 CAZymes were identified in the genome of A. stygium, more than that in the closely related Hypoxylon sp. CO27-5 (599 CAZymes) and Hypoxylon sp. CI-4A (526 CAZymes). The number of CAZymes in A. stygium was much higher than that in Tremella enchepala (265 CAZYmes; Magnuson et al. 2017) and T. mesenterica (206 CAZYmes; Floudas et al. 2012), indicating that A. stygium may assist Tremella species in the degradation of lignin and carbohydrates in nature or for industrial production. The genome sequence data of A. stygium in this study will provide useful information for understanding the mechanism of the symbiotic interaction between A. stygium and T. fuciformis.

Authors: Q. Li, X. Ma, H. Li, C. Xiong, Y. Gao, Y. Dong*, and W. Huang*

*Contact: Yang Dong: loyalyang@163.com; Wenli Huang: wenlih11@126.com

IMA Genome-F 9B

Draft genome sequence of Aspergillus mulundensis, a fungus that produces mulundocandins

Introduction

A strain of Aspergillus (Y-30462 = DSMZ 5745) was isolated at Hoechst India, then located in the Mulund district of Mumbai, India, from a soil sample collected in Bangladesh (Mukhopadhyay et al. 1987, Roy et al. 1987). In the original publication, the fungus was described as an unusual variant of A. sydowii because of the presence of abundant Hülle cells and was published without a Latin description or type specimen as “A. sydowii var. mulundensis”. This strain was subsequently re-examined using multi-gene phylogenetic analysis, chemotaxonomic markers, and morphological data and was determined as representing a novel species within Aspergillus sect. Nidulantes (Bills et al. 2016, Chen et al. 2016).

The primary objective for sequencing the genome of A. mulundensis was the identification of the gene cluster-encoding the biosynthesis of the muludocandins (Yue et al. 2015). Mulundocandin and deoxymulundocandin (Fig. 2) are lipohexapeptides and potent antifungal antibiotics of the echinocandin class (Mukhopadhyay et al. 1987, Roy et al. 1987, Mukhopadhyay et al. 1992). Biosynthetically, they are closely related to echinocandin B, but they differ in the substitution of serine instead of threonine in the fifth position of the hexapeptide core and by a 12-methyl myristoyl side chain instead of a lineolyl side chain. Mulundocandin and its deoxymulundocandin have been investigated extensively as potential lead structures for the development of echinocandin-type antifungal drugs (Mukhopadhyay et al. 1992, Hawser et al. 1999, Lal et al. 2003, 2004). This draft genome will expand genomic data sets for comparative genomics of species in Aspergillus sect. Nidulantes.

Fig. 2
figure 2

Some naturally occurring echinocandins described in the patent literature.

Sequenced Strain

Bangladesh: unknown location, isolated from soil at Hoechst (Mumbai, India) (Hoechst Y-30462 = DSMZ 5745 = CBS 140610 = IBT 33104).

Nucleotide Sequence Accession Number

The Aspergillus mulundensis isolate DSMZ 5745 Whole Genome Shotgun project has been deposited in GenBank under the accession number PVWQ00000000.

Materials and Methods

For methods for DNA extraction, sequencing, and genome assembly and annotation, see Bills et al. (2016).

Results and Discussion

The genome of DSMZ 5745 was sequenced to 100-fold coverage, yielding 160 scaffolds with N50 of 2.8 Mb (Table 2). The assembled genome size was 45 Mb, and a total of 11603 genes were predicted. The GC content of this genome is 43.2 %. The genome contains 53 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 25 PKSs, 19 NRPSs, one PKS-NRPS hybrids, four dimethylallyl tryptophan synthases, and four terpene synthases. These genes are distributed among 45 putative gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, 14 secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue et al. 2015) was recognized, and predicted to be responsible for the biosynthesis of muludocandins. The nuclear-encoded secondary metabolomes of A. mulundensis and A. nidulans FGSC A4 were compared previously (Bills et al. 2016). A phylogenetic tree reflecting the position of this genomes in relation to other Aspergillus species is presented (Fig. 3.)

Table 2 General features of the genomes of Coleophoma cylindrospora BP6252 and BP5796, Phialophora cf. hyalina BP5553, and Aspergillus mulundensis DSMZ 5745.
Fig. 3
figure 3

Maximum Likelihood tree of ex-type and authentic strains of Aspergillus sect. Nidulantes (25 strains) inferred based on an alignment of the concatenated sequences of the ITS-28S rDNA, ribosomal polymerase II, β-tubulin, and calmodulin genes. Data were resampled from Chen et al. (2016). DMSZ 5745 is labelled in red, and A. unguis was positioned as the outgroup. The Maximum Likelihood tree was based on the Tamura-Nei model. The tree with the highest log likelihood (−13 959.85) is shown. Branches are labelled with the percentage of trees in which the associated taxa clustered together. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.2371)). Branch lengths were measured in the number of substitutions/site. The dataset included 3329 positions. Data were analyzed in MEGA7 (Kumar et al. 2016). “Type” = ex-type cultures.

Authors: Q. Yue, Y. Li, L. Chen, X. Zhang, K. Li, J. Sun, X. Liu, Z. An, and G.F Bills*

*Contact: billsge@vt.edu

IMA Genome-F 9C

Draft genome sequence of the root pathogen Berkeleyomyces basicola (syn. Thielaviopsis basicola)

Introduction

Berkeleyomyces basicola (Ascomycota: Microascales), previously known as Thielaviopsis basicola (Nel et al. 2017), is an important plant pathogen responsible for root rot of many important agricultural and ornamental plants (Johnson 1916, Stover 1950, Nehl et al. 2004, Pereg 2013). Since its description in the mid-1800s (Berkeley & Broome 1850), there has been considerable debate surrounding its appropriate taxonomic placement resulting in numerous name changes. The phylogenetic re-evaluation of Ceratocystidaceae by De Beer et al. (2014) raised new questions regarding the appropriate taxonomic placement of the species. Their results suggested that T. basicola did not group in Thielaviopsis or any other genus described in the family. Because the authors included only the sequence data of a single isolate in their analyses they concluded that no taxonomic changes could be made without further study. In a recent investigation, Nel et al. (2017) confirmed that T. basicola represented a distinct generic lineage in Ceratocystidaceae and introduced the new generic name Berkeleyomyces. In addition, they showed that isolates of Berkeleyomyces represented two cryptic sister species for which they provided the names B. basicola and B. rouxiae.

The aim of this study was to generate a high-quality genome sequence for B. basicola. This would allow for comparisons to be made with the available genomes of other species in Ceratocystidaceae, including those in the genera Ceratocystis, Huntiella, Davidsoniella, Thielaviopsis, Chalaropsis, Endoconidiophora and the recently described Bretziella (De Beer et al. 2017). Here we report the complete genome sequence of isolate CMW 49352, the designated reference specimen for B. basicola logged in CBS (Westerdijk Fungal Biodiversity Institute, Utrecht, The Netherlands), and the culture collection of the Forestry and Agricultural Biotechnology Institute (CMW), University of Pretoria, South Africa.

Sequenced Strain

The Netherlands: South Holland, Boskoop, isol ex Betula sp., June 1974. S.G. De Hoog (CMW 49352 = CBS 142796; PREM 62125 = dried culture).

Nucleotide Sequence Accession Number

The draft genome sequence of Berkeleyomyces basicola (CMW 49352 = CBS 142796) has been deposited at DDBJ/ENA/GenBank under the accession number PJAC00000000. The version presented here is PJAC00000000.

Materials and Methods

Genomic DNA was extracted from lyophilized mycelium of Berkeleyomyces basicola isolate CMW 49352 grown in malt yeast broth (2 % Malt extract, 0.5 % yeast extract; Biolab, Midrand, South Africa) using the method described by Duong et al. (2013). A paired-end library was prepared (350 bp average insert sizes) and sequenced using the Illumina HiSeqX Platform. A mate-pair library was prepared (10 Kb average insert size) and sequenced using the Illumina HiSeq 2500 platform. Long reads were also generated using one cell of the Single-molecule real time (SMRT or PacBio) sequencing platform (Pacific BioScience). All sequencing was conducted at Macrogen (Seoul, Korea). Quality and adapter trimming of pair-end and mate-pair reads was carried out using Trimmomatic v. 0.36 (Bolger et al. 2014). De novo assembly of the genome was carried out using SPAdes v. 3.9 (Bankevich et al. 2012) using all pair-end, mate-pair and PacBio data. Contigs smaller than 500 bp were removed from the dataset. Initial scaffolding was done using SSPACE-standard v. 3.0 (Boetzer et al. 2011) with the paired-end and mate-pair reads. A second round of scaffolding was done using SSPACE-Longread with the PacBio reads. Assembly gaps were filled using GapFiller v. 1.10 (Boetzer & Pirovano 2012) with the paired-end and mate-pair reads, and using PBJelly (English et al. 2012) with PacBio reads. Final genome polishing was done using Pilon (Walker et al. 2014). Genome completeness was assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO v. 1.1b1) tool using the Ascomycota dataset (Simão et al. 2015). The number of protein coding genes was determined using Augustus v. 3.3.2 (Stanke et al. 2004) using pre-optimised species models for Fusarium graminearum.

Results and Discussion

The paired-end, mate-pair, and PacBio sequencing yielded 431 141 384, 60 673 400 and 42 422 reads, respectively. Final assembly consisted of 81 contigs, with the largest around 3.8 Mb and an N50 of 1.2 Mb. The estimated size of the genome is around 25.1 Mb with a GC content of 52 %. This estimated size is similar to that of other species in Ceratocystidaceae, which range between 25.4 Mb for Huntiella moniliformis and 33.6 Mb for Davidsoniella virescens (Wilken et al. 2013, Van der Nest et al. 2014a, b, Wingfield et al. 2015a, b, 2016a, b). The phylogenetic position of Berkeleyomyces basicola is presented in Fig. 4.

Fig. 4
figure 4

Maximum Likelihood (ML) phylogram derived from the analyses of the partial MCM7 gene sequences for species in Ceratocystidaceae. CLCbio Genomics Workbench v. 9.5 (CLCbio, QIAGEN, Aarthus, Denmark) was used to screen the genome of B. basicola isolate CMW 49352 to identify and extract the MCM7 gene using an available reference sequence for the gene from B. basicola (Accession: MF967102). A dataset was prepared based on the phylogenies of Nel et al. (2017) and sequences were downloaded from NCBI GenBank. DNA sequence alignments of the dataset were done using the online version of MAFFT v. 7 (Katoh & Standley, 2013). The ML analyses were performed in MEGA v. 6.06 (Tamura et al. 2013) using the GTR model. Values shown at nodes are confidence values >75 %. The sequence from the B. basicola genome is indicated in bold.

BUSCO analysis predicted an assembly completeness of 97.4 %. The assembly contained 1280 complete single-copy BUSCOs, one complete and duplicated BUSCOs, 10 fragmented BUSCOs and 24 missing BUSCOs out of a total 1315 BUSCO groups searched. AUGUSTUS annotation predicted 10 074 putative coding regions, corresponding to around 401 ORFs/Mb. The availability of the genome for B. basicola will make possible genome comparisons with other species in Ceratocystidaceae and facilitate investigations into factors involved in pathogenicity, ecology, mating, and evolution of this important plant pathogen.

Authors: W.J. Nel, T.A. Duong, A. Hammerbacher, B.D. Wingfield, M.J. Wingfield, and Z.W. De Beer

*Contact: janine.nel@fabi.up.ac.za

IMA Genome-F 9D

Draft nuclear genome assembly for Ceratocystis smalleyi

Introduction

The genus Ceratocystis as defined by De Beer et al. (2014) is a diverse assemblage of species that are best known as pathogens of angiosperm trees and commercially grown root crops (De Beer et al. 2014, Li et al. 2016, Seifert et al. 2013, Van Wyk et al. 2012). Among these, C. fimbriata s. lat. is arguably the best-known pathogen, and has been associated with diseases of sweet potato (Halsted & Fairchild 1891), taro (Huang et al. 2008), pomegranate (Somasekhara 1999) and kiwifruit (Piveta et al. 2016). The related C. manginecans causes disease on mango and Acacia mangium trees in Oman and Pakistan (Al-Subhi et al. 2006, Al Adawi et al. 2013), while C. eucalypticola is responsible for mortality on commercially planted eucalypt trees in South Africa (Van Wyk et al. 2012). These fungi all share a very similar morphology, making their species boundaries difficult to determine (Fourie et al. 2014, Harrington et al. 2014). In contrast, several other species in the genus are clearly defined, with universally accepted species status (Engelbrecht & Harrington 2005). These include C. albifundus (a pathogen of commercially propagated Acacia mearnsii and Protea cynaroides in South Africa; Lee et al. 2016), C. cacaofunesta (causing cacao wilt in the Caribbean and Central and South America; Engelbrecht et al. 2007), and C. smalleyi (agent of hickory decline in the USA; Johnson et al. 2005).

Ceratocystis smalleyi was first isolated from a hickory tree (Carya sp.) that had been infested by the hickory bark beetle Scolytus quadrispinosus (Johnson et al. 2005). In 2005, C. smalleyi was formally named and described after additional isolates were collected from Carya trees that had been attacked by the hickory bark beetle across parts of the eastern US (Johnson et al. 2005). The authors subsequently linked C. smalleyi with the decline of hickory through a possible association with the bark beetle S. quadrispinosus (Johnson et al. 2005). Later studies have confirmed C. smalleyi as a pathogen on Carya species (Park et al. 2010, 2013), and established the close association between the fungus and the bark-beetle (Juzwik et al. 2010). This makes C. smalleyi the only known Ceratocystis species to be associated with a bark-beetle. In other Ceratocystis species, the production of volatiles is linked to attracting insects for dispersal (Van Wyk et al. 2009, 2012). The specific association between C. smalleyi and the vector S. quadrispinosus would eliminate the need for producing volatile attractants, and could explain the inability of this species to produce the fruity odours characteristic of other Ceratocystis species (Harrington 2009; Johnson et al. 2005).

In this study, we aimed to produce a draft genome assembly for C. smalleyi. This assembly would be the seventh Ceratocystis species for which a genome sequence is published, and adds to the valuable genomic resource available for members of Ceratocystidaceae (Molano et al. 2018, Van der Nest et al. 2014a, b, 2015, Vanderpool et al. 2017, Wilken et al. 2013, 2018, Wingfield et al. 2015a, b, 2016a, b). Furthermore, the availability of a genome assembly will afford the opportunity in future to investigate aspects of the unique biology of C. smalleyi.

Sequenced Strain

USA: Wisconsin: Hickory Ridge, isol. Carya cordiformis. Oct. 1993, G. Smalley (CMW 14800, CBS 114724, BPI 843722 — dried culture).

Nucleotide Sequence Accession Number

This Whole Genome Shotgun project for Ceratocystis smalleyi isolate CMW 14800 has been deposited at DDBJ/ENA/GenBank under accession NETT00000000. The version described in this paper is version NETT01000000.

Materials and Methods

Ceratocystis smalleyi isolate CMW 14800 was obtained from the culture collection of the Forestry and Agricultural Biotechnology Institute (FABI) and grown on 2 % malt extract agar (MEA: 2 % w/v, Biolab, South Africa) at 25 °C. A 14 d old culture was used to isolate genomic DNA using a previously described phenol-chloroform protocol (Roux et al. 2004). The isolated DNA was submitted for sequencing on an Illumina Genomics Analyzer IIx at the UC Davis Genome Centre (University of California, Davis). For sequencing, paired-end libraries of 350 bp and 600 bp insert sizes were prepared and sequenced following the protocol provided by Illumina (https://doi.org/www.illumina.com). The raw sequencing reads were imported into CLC Genomics Workbench v. 7.5.1 (CLCBio. Aarhus), and default settings were used to both trim the reads for quality and to produce a de novo genome assembly using the trimmed reads. Scaffolds were generated from the assembly using SSPACE v. 2.0 (Boetzer et al. 2011), while GapFiller v. 2.2.1 (Boetzer & Pirovano 2012) was used to fill any gaps created during scaffolding. Sequencing coverage was estimated by mapping the trimmed sequencing reads to the contigs, while an estimate of the number of putative open reading frames (ORFs) were obtained through de novo gene prediction using the web-based version of AUGUSTUS and gene models from Fusarium graminearum (Keller et al. 2011). The Benchmarking Universal Single-Copy Orthologs (BUSCO v. 1.22) tool was used in combination with the fungal data set to provide a quantitative measure of the level of genome completeness (Simão et al. 2015). The 60S, LSU and MCM7 gene regions were extracted from the genome and, together with these regions from the recently sequenced species C. cacaofunesta (Molano et al. 2018), T. punctulata isolate CMW1032 (Wilken et al. 2018), H. savannae (Van der Nest et al. 2015) and A. xylebori (Vanderpool et al. 2017) were added to the Ceratocystidaceae dataset used for phylogenetic analysis by Wingfield et al. (2017). The resulting datasets were aligned using MUSCLE (Edgar 2004), concatenated, and used to construct a Maximum Likelihood phylogeny using PhyML 3.1 (Guindon et al. 2010) based on model parameters estimated with jModelTest 2.1.10 (Darriba et al. 2012).

Results and Discussion

The 27 311 342 bp Ceratocystis smalleyi genome was present in 2261 contigs, of which 1242 contigs were larger than 500 bp. The draft assembly yielded a genome with a G/C content of 50.6 %, an average coverage of 84x and 6682 predicted open reading frames at an average gene density of 245 ORFs/Mb. BUSCO analysis indicated a genome completeness of 97 % with 1394 of the 1438 searched orthologs present in the genome being complete. In total 1330 ORFs occurred as single copies while 64 were duplicates. Of the remaining searched homologs, 37 were fragmented while the remaining seven were missing from the genome assembly.

The genome of C. smalleyi was comparable in size and gene content to that of other Ceratocystis species (Wingfield et al. 2015b, 2016a, b). At 27.3 Mb, the C. smalleyi genome is slightly larger than that of the related species C. harringtonii (genome size of 26 Mb; Wingfield et al. 2016b), but smaller than the genome of C. manginecans (31.7 Mb; Van der Nest et al. 2014b). Gene densities for published Ceratocystis genomes range from 204–257 ORF/Mb (Wingfield et al. 2015b, 2016a, b), and the C. smalleyi gene density falls within this range. In contrast, the 50.6 % G/C content of the C. smalleyi genome is unusually high, with all other Ceratocystis species showing G/C contents below 49 % (Wingfield et al. 2015b, 2016a, b).

The availability of multiple Ceratocystis genomes (Fig. 5) provides the opportunity to study the genetic aspects that underlie ecological and life-style differences between members of this genus. Understanding these differences will also be crucial in explaining at least some of the variations in gene content, genome size, and G/C content evident among these genomes. Currently, the published Ceratocystis genomes make up the bulk of the Ceratocystidaceae genome resource with published genomes available for seven species (Fig. 5; Molano et al. 2018, Van der Nest et al. 2014a, b, Wilken et al. 2013, Wingfield et al. 2015b, 2016b). In addition, published genome sequences are available for five Huntiella species (Van der Nest et al. 2014a, b, 2015, Wingield et al. 2016b, 2017), two Endoconidiophora species (Wingield et al. 2016a), three isolates representing two Thielaviopsis species (Wilken et al. 2018, Wingfield et al. 2015a; Wingfield et al. 2015b), one Davidsoniella species (Wingield et al. 2015b), one Bretziella species (previously Ceratocystis fagacearum; De Beer et al. 2017, Wingfield et al. 2016b), one Ambrosiella species (Vanderpool et al. 2017) as well as for C. adiposa (Wingfield et al. 2016a). This brings the number of published Ceratocystidaceae genomes to 21, with the genome assemblies of several others publicly available (https://doi.org/www.ncbi.nlm.nih.gov/assembly/?term=ceratocystidaceae). Such a vast genomic resource will prove valuable to future studies on Ceratocystidaceae, a family that include fungal species with diverse life-styles and hosts.

Fig. 5
figure 5

A Maximum Likelihood phylogeny showing Ceratocystidaceae isolates for which published whole-genome sequences are available, including that of C. smalleyi discussed here. The 60S, LSU, and MCM7 gene regions were used, and was either extracted from the assembled genomes or were obtained from the study of Wingfield et al. (2017). Phylogeny constructed using the TrN+I+G model with confidence values based on 1000 bootstrap replicates. Only bootstrap values ≥ 75 are shown.

Authors: P.M. Wilken*, M.A. van der Nest, E.T. Steenkamp, K. Naidoo, M.J. Wingfield, and B.D. Wingfield

* Contact: Markus.Wilken@fabi.up.ac.za

IMA Genome-F 9E

Draft genome sequences of two Cercospora beticola strains from table beet

Introduction

The genus Cercospora (Mycosphaerellaceae) includes several economically important plant pathogens causing leaf and fruit spots on a range of agricultural crops worldwide (Groenewald et al. 2013). Cercospora species are known to produce cercosporin, a photo-activated toxin that contributes to pathogenicity on a broad range of crops (Daub et al. 2000). Cercospora beticola is the cause of Cercospora leaf spot (CLS) on sugar and table beet (Beta vulgaris ssp. vulgaris), and Swiss chard (Beta vulgaris ssp. cicla) worldwide (Franc 2010). In New York, CLS is the most important disease affecting foliar health of table beet. Symptoms include leaf spots and necrotic lesions with red to purple margins, which coalesce as the disease progresses, and can result in complete defoliation (Pethybridge et al. 2017). In broadacre production systems, maintenance of foliar health is important to enable mechanized harvest. For fresh market sales, the presence of CLS lesions on the leaves may result in rejection (Pethybridge et al. 2017).

The control of CLS in table beet is dependent on fungicides (Pethybridge et al. 2017). However, resistance to single-site mode of action fungicides threatens the durability of CLS control. Recent studies reported a high frequency of isolates with resistance to quinone outside inhibitor fungicides in New York (Vaghefi et al. 2016). Moreover, succinate dehydrogenase inhibitor fungicides, which are known to be effective in controlling CLS on sugar beet, failed to provide efficacious control on table beet (Pethybridge et al. 2017), and a few isolates with reduced sensitivity to demethylation inhibitors have been detected (Pethybridge, unpubl.). Identifying genomic regions associated with sensitivity to fungicides will enable rapid screening of C. beticola populations. Enhanced genomic information for this pathogen will also facilitate studies into the mechanisms of pathogenicity. De novo genome assembly of two C. beticola strains from table beet are presented here, and made publically available to facilitate genetic studies of this globally important plant pathogen.

Sequenced Strains

USA: New York: western New York, Batavia, from Beta vulgaris ssp. vulgaris (table beet), 2014, F.S. Hay (Tb14-085 = ICMP 21692); ibid. (Tb14-047 = ICMP 21690).

Nucleotide Sequence Accession Number

The Whole Genome Shotgun projects have been deposited at DDBJ/EMBL/GenBank under the accessions PDUH00000000 and PDUI00000000.

Methods

Two C. beticola isolates belonging to opposite mating-types (ICMP 21690 [MAT-2] and ICMP 21692 [MAT-1]), collected from table beet in New York, were selected for whole genome sequencing. The identity of the strains as C. beticola was confirmed through multi-locus sequence typing of five loci; ITS, actin, calmodulin, histone H3 and translation elongation factor 1-a (Fig. 6). Fungal strains were cultured in clarified V8 broth (10 % (v/v) clarified V8 juice (Campbell’s Soup, USA), 0.5 % (w/v) CaCO3). Seven-day-old mycelia were harvested, and genomic DNA was extracted as described in Vaghefi et al. (2016). The extracted DNA samples were quantified using a Qubitfluorometer (Invitrogen, NY).

Fig. 6
figure 6

Identity verification of Cercospora beticola isolates sequenced in this study. The phylogeny was constructed by Bayesian inference based on the sequences of five loci; ITS, act, cmd, his and tef1-α. Sequence alignments were produced using MAFFT v. 7 (Katoh & Standley 2013) (MrBayes v. 3.1.2; Ronquist & Huelsenbeck, 2003). Branches with posterior probability of 1.00 are thickened. The tree was rooted to C. zeaemaydis (CBS 117757).

A total of 5.6 and 5.0 µg genomic DNA of ICMP 21692 and ICMP 21690 were used to prepare PCR-free libraries with average insert of ∼550 bp, using the Illumina paired-end (2×300 bp) MiSeq platform at the Cornell University Institute of Biotechnology Genomics Facility (Ithaca, NY). PCR-free libraries were constructed using Illumina’s TruSeq Nano DNA LT Sample Preparation kits, according to the manufacturer’s protocol. This yielded 4 607 564 and 4 798 846 paired-end reads, totalling 2.7 and 2.9 Gb data for ICMP 21692 and ICMP 21690, respectively. Quality control of the sequences was conducted using FastQC v.0.11.2 (https://doi.org/www.bioinformatics.bbsrc.ac.uk/projects/fastqc) in the GALAXY portal (Afgan et al. 2016). The Kmer counting software Jellyfish v.2.2.3 (Marçais & Kingsford 2011) was used to estimate the genome size.

De novo genome assembly was conducted using DISCOVAR de novo v.52488; an assembler designed for de novo assembly of long Illumina paired-end reads from single PCR-free libraries (Weisenfeld et al. 2014). The completeness of the final assemblies was assessed using Benchmarking Universal Single-Copy Orthologs (BUSCO) v.1.2 (Simão et al. 2015). Gene prediction was conducted in the genome annotation pipeline Maker v.2.31.9 (Cantarel et al. 2008), using contigs at least 500 bp in length only for ICMP 21692. A preliminary annotation used the ab initio gene prediction program SNAP (Korf 2004). The resulting annotation was used to produce a hidden-markov-model (HMM) profile for C. beticola, which was further refined with a second stage of SNAP training. The refined HMM file was used for the final annotation (Cantarel et al. 2008).

Results and Discussion

The Illumina paired-end (2×300 bp) sequencing of C. beticola isolates ICMP 21692 and ICMP 21690, resulted in 4 607 564 and 4 798 846 reads for each strain, respectively, with mean base quality of 28.5 and 28.7. The estimated genome size of C. beticola was ∼37 Mb, based on an approximated genome coverage for both strains of at least 74×. The draft genome of ICMP 21692 had a total assembly size of ∼35.03 Mbp (for 1 kb+ scaffolds), a scaffold N50 value of 1 023 488 bp, and maximum contig size of 3 283 856 bp. The draft genome of ICMP 21690 had a total assembly size of ∼34.5 Mbp (for 1 kb+ scaffolds), and a scaffold N50 value of 654 439 bp, and maximum contig size of 2 437 838 bp. Both assemblies were 97 % complete based on the content of BUSCO. All contigs with a length of ≥ 200 bp were submitted to the genome database of NCBI. Ab initio gene prediction for ICMP 21692 using trained SNAP identified 12 834 Open Reading Frames (ORFs).

The estimated genome size of C. beticola (37 Mb) is comparable to that of multiple Cercospora species, including C. canescens (∼34 Mb; Chand et al. 2015), C. cf. sigesbeckiae (∼35 Mb; Albu et al. 2017), and C. zeina (∼37 Mb; Wingfield et al. 2017). The draft genome of ICMP 21692 has already provided the foundation for global population genetics studies of C. beticola using microsatellite markers and Genotyping-By-Sequencing (Vaghefi et al. 2017a, b). Current studies are focused on identification and characterisation of the genes responsible for sensitivity to fungicides. Availability of genomic data will provide a powerful tool for characterising the genes involved in pathogenicity.

Authors: N. Vaghefi*, J.R Kikkert, and S.J. Pethybridge

*Contact: Niloofar.Vaghefi@usq.edu.au

IMA Genome-F 9F

Draft genome sequence of Coleophoma cylindrospora BP6252, the fungus used to produce sulfated echinocandins FR220897 and FR220899

Introduction

Researchers at Fujisawa Pharmaceutical (now Astellas Pharma) isolated strain BP6252 (No. 14573) from an unidentified decaying leaf from Tsushima Island, Japan and identified it as Coelophoma empetri. They fermented the strain to produce two water-soluble echinocandin analogues FR220897 and FR220899 (WF14573A and WF14573B) (Hori et al. 2004, Kanasaki et al. 2006) (Fig. 2). FR220897 and FR220899 are isomers of FR901379 which is used for semisynthesis of micafungin. FR901379 is produced by a different strain of C. empetri F-11899 (Iwamoto et al. 1994a, b). Differential antifungal activity of these isomers was critical to understanding the effects of the position of the homotyrosine sulfate residue on the antifungal activity (Hino et al, 2001, Kanasaki et al. 2006). Like other echinocandins, the metabolites strongly inhibited β-1,3-glucan synthase and exhibited potent in vitro activity against Candida albicans and Aspergillus fumigatus, and FR220897 was effective in mouse candidiasis models. The discovery of these echinocandin variants was significant because sulfation of the homotyrosine residue overcomes the inherent poor water-solubility that had previously impeded development of echinocandin-type of antibiotics, including echinocandin B, aculeacins, and the pneumocandins.

Coleophoma cylindrospora is a widespread endophyte and leaf saprobe and can be a weak pathogen of leaves and fruits of many woody plants (Sutton 1980, Wu et al. 1996, Polashock et al. 2009, Crous & Groenewald 2016). The phylogenetic affinity of the strain producing FR220897 and FR220899 was established with multiple phylogenetic marker sequences and was found to be conspecific with other strains of C. empetri (Yue et al. 2015) (Fig. 7). Subsequently, during a revision of the polyphyletic genus Coleophoma, C. empetri was found to be phylogenetically indistinct from the similar C. cylindrospora and was considered to be a synonym of the latter (Crous & Groenewald 2016).

Fig. 7
figure 7

Maximum Likelihood tree of genome-sequenced strains producing echinocandins (red) and selected strains of the Leotiomycetes (55 strains total) based on an alignment of the ITS and 28S rDNA. Botryotinia fuckeliana was positioned as the outgroup. The tree was inferred by using the maximum likelihood method based on the Kimura 2-parameter model. The tree with the highest log likelihood (∼4229.10) is shown. The percentage of trees in which the associated taxa clustered together is labelled on branch nodes. A discrete gamma distribution was used to model evolutionary rate differences among sites (5 categories (+G, parameter = 0.4881)). Branch lengths were measured in the number of substitutions/site. All positions containing gaps and missing data were eliminated. The dataset included 955 positions. Data were analyzed in MEGA7 (Kumar et al. 2016).

The primary objective behind the sequencing the genome of C. cylindrospora was the identification of the gene cluster-encoding the biosynthesis of FR220897 and FR220899 (Yue et al. 2015). The genome sequence will be essential for identifying the mechanism of the regiospecific sulfation reaction. The draft genome also has revealed that the strain harbours an auxiliary copy of β-1,3-glucan synthase that may function as an echinocandin resistance gene (Yue et al. 2018). This draft genome will expand genomic data sets for comparative genomics of species in Leotiomycetes, Dermataceae, and endophytic fungi in general.

Sequenced Strain

Japan: Tsushima Island: Nagasaki Prefecture, isolated from decaying leaf, [no further information] (NBRC-NITE BP6252, Fujisawa No. 14573).

Nucleotide Sequence Accession Number

The C. cylindrospora isolate BP6252 Whole Genome Shotgun project has been deposited in GenBank under the accession number PDLM00000000.

Materials and Methods

Lyophilized mycelia harvested from liquid cultures were ground in liquid nitrogen and genomic DNA was isolated by using the CTAB protocol (https://doi.org/1000.fungalgenomes.org/home/wp-content/uploads/2013/02/genomicDNAProtocol-AK0511.pdf). A 180 bp insert library and a 5 kb mate-pair library were constructed for Illumina sequencing and were sequenced on an Illumina Hiseq 2000 V4 sequencing platform (Yue et al. 2015). The Illumina sequencing reads were assembled using Velvet 1.2 (Zerbino & Birney 2008). Ab initio gene predictions from the genome assembly were made with Augustus (Stanke et al. 2004). Predicted genes were annotated by BLAST searches against UniProt databases (https://doi.org/www.uniprot.org/). Polyketide synthases (PKSs), non-ribosomal peptide synthetases (NRPSs), dimethylallyl tryptophan synthases and related biosynthetic gene clusters were predicted by antiSMASH ver. 3.0 and manual annotation (Weber et al. 2015).

Results and Discussion

The genome of BP6252 was sequenced to 100-fold coverage, yielding 77 scaffolds with N50 of 2.3 megabases (Mb). The assembled genome size was 42.4 Mb, and a total of 14,177 genes were predicted. The GC content of this genome is 48.7 %. The genome contains 26 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 15 PKSs, eight NRPSs, two dimethylallyl tryptophan synthases, and one terpene synthase. These genes are distributed among 21 putative gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, nine secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue et al. 2015) was recognized and predicted to be responsible for the biosynthesis of FR220897 and FR220899.

Authors: Q. Yue, Y. Li, L. Chen, X. Zhang, K. Li, J. Sun, X. Liu, Z. An, and G.F Bills*

*Contact: billsge@vt.edu

IMA Genome-F 9G

Draft genome sequence of Coleophoma cylindrospora BP5796, the fungus used to produce the echinocandin variants FR209602 and related echinocandins

Introduction

Researchers at Fujisawa Pharmaceuticals (now Astellas Pharma) isolated strain BP5796 (Fujisawa No. 738) from an unidentified leaf sample collected in Japan and identified it as C. crateriformis based on comparisons in conidia dimensions with respect to the known Coleophoma species at the time (Wu et al. 1996). The strain was fermented to produce three water-soluble echinocandin analogues, designated FR209602, FR209603 and FR209604 (Fig. 2). These analogues differ from FR901379 (WF11899A) and its analogues by a substitution of threonine for serine at the peptide’s third amino acid and deoxygenation of the homotyrosine residue at C-4. Like other echinocandins, these metabolites strongly inhibited activity of β-1,3-glucan synthase and exhibited potent in vivo activity against C. albicans and A. fumigatus in murine systemic infection models.

The phylogenetic affinity of the strain producing FR209602 and analogues was established with multiple phylogenetic marker sequences (Yue et al. 2015) (Fig. 7). Although, we had retained the original identification of C. crateriformis in previous work on the evolution of the echinocandin pathways, a multi-gene phylogeny indicated the strain was conspecific with other strains named as C. empetri. Subsequently, during a revision of the polyphyletic genus Coleophoma, it was noted that an authentic strain of C. crateriformis, the type species of the genus Coleophoma, was lacking, and thus, its phylogenetic affinities within the genus remained to be determined (Crous & Groenewald 2016). Because strain BP5796 appears to be phylogenetically indistinct from the similar C. cylindrospora, we consider it to be conspecific with the latter (Crous & Groenewald, 2016).

The primary motivation for sequencing the genome of C. cylindrospora BP5796 was to identify the gene cluster-encoding the biosynthesis of FR209602. The genome sequence will be essential for identification of the mechanism of the regiospecific sulfation reaction. The draft genome also has revealed, that like BP6252, the strain harbours an auxiliary copy of β-1,3-glucan synthase that may function as an echinocandin resistance gene (Yue et al. 2018). This draft genome will expand resources for comparative genomics of species in Dermataceae and endophytic fungi.

Sequenced Strain

Japan: Toyama Prefecture: Mount Tateyama, isolated from leaf sample, [no further information] (NBRC-NITE BP5796, Fujisawa No. 738).

Nucleotide Sequence Accession Number

The C. cylindrospora isolate BP5796 Whole Genome Shotgun project has been deposited in GenBank under the accession number PDLN00000000.

Materials and Methods

The methods for DNA extraction, sequencing, and genome assembly and annotation were essentially the same as for strain BP6252 above.

Results and Discussion

The genome of BP5796 was sequenced to 100-fold coverage, yielding 45 scaffolds with N50 of 2.0 Mb. The assembled genome size was 40.4 Mb, and a total of 13257 genes were predicted. The GC content of this genome is 48.5 %. The genome contains 24 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 15 PKSs, six NRPSs, two dimethylallyl tryptophan synthases, and one terpene synthase. These genes are distributed among 21 putative gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, seven secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition, a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue et al. 2015) was recognized, and predicted to be responsible for the biosynthesis of FR206902. This gene cluster deviated from other echinocandin gene clusters by the loss of a cytochrome P450 gene orthologous to htyF in A. pachycristatus and GLP450-1 in Glarea lozoyensis which would account for the absence of the hydroxylation at the homotyrosine C4.

Authors: Q. Yue, Y. Li, L. Chen, X. Zhang, K. Li, J. Sun, X. Liu, Z. An, and G.F Bills*

*Contact: billsge@vt.edu

IMA Genome-F 9H

Draft genome sequence of Fusarium fracticaudum

Introduction

The genus Fusarium contains numerous well-known socio-economically important fungi (Nelson et al. 1983). Many of these fungi form part of the Fusarium fujikuroi Species Complex (Geiser et al. 2013) for which various whole genome sequences have been published, e.g. Fusarium fujikuroi (Jeong et al. 2013, Wiemann et al. 2013, Chiara et al. 2015), Fusarium temperatum (Wingfield et al. 2015b) and Fusarium circinatum (Wingfield et al. 2012, van der Nest et al. 2014a). The latter causes pitch canker, which is a devastating disease of pine (Wingfield et al. 2008). Of the five other species found to be associated with F. circinatum-like symptoms on pine in Colombia (Herron et al. 2015), the genome of F. pininemorale has been sequenced (Wingfield et al. 2017). In this study, we determined the whole genome sequence for F. fracticaudum, which was also described by Herron et al. (2015). Like F. pininemorale, this species does not seem to be a pathogen of pine as it could not incite lesions on the stems of pine seedlings in standard pathogenicity assays (Herron et al. 2015). These differences between F. circinatum and these non-pathogenic Fusarium species on Pinus will provide an opportunity for genome comparisons.

The association of F. fracticaudum with diseased pines and the genetic basis of biological traits in F. fracticaudum is not yet understood. Availability of various sequenced genomes of species within the FFSC is enabling studies into the biology and evolution of these fungi (Ma et al. 2013, De Vos et al. 2014, Niehaus et al. 2016). Here we determine the genome sequence of F. fracticaudum, which will provide an additional resource for comparative genomic studies aimed at understanding the evolution of these fungi and unravelling the molecular basis of their plant interactions.

Sequenced Strain

Colombia: Angela Maria, Santa Rosa Risalda, 75°36′21″ W 4°49′18″ N, isolated from diseased Pinus maximinoi trees (CMWF25245; FCC5385; CBS137234; Herron et al. 2015).

Nucleotide Accession Number:

This Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under the accession PDNT00000000. The version described in this paper is version PDNT01000000.

Methods

Genome sequence

The F. fracticaudum isolate was grown on ½ Potato Dextrose agar (PDA; BD Difco™) at 25 °C for 7 d. Genomic DNA was extracted from fungal mycelium following the protocol of Möller et al. (1992). Genome sequencing was done with one paired-end (350 bp median insert size) and one mate-pair (5 kb median insert size) library using Illumina HiSeq XTen and Hiseq 2000 platforms respectively, at Macrogen (Seoul, Korea). CLC Genomics Workbench v. 8.0.1 (CLCBio, Aarhus Denmark) was used to trim sequences less than 18 bp. The quality filtered reads were subjected to de novo assembly in ABySS v. 1.3.7 (Simpson et al. 2009), followed by scaffolding with SSPACE v 2.0 (Boetzer et al. 2011). The gaps within the sequences were closed using Gapfiller v. 1.11 (Boetzer & Pirovano 2012). To determine the completeness of the genome assembly, BUSCO v2.0.1 (Benchmarking Universal Single Copy Orthologs; Simão et al. 2015) was employed using the sordariomycete dataset. Scaffolds were compared to those of the chromosomes of F. fujikuroi (Wiemann et al. 2013) and F. temperatum (Wingfield et al. 2015b) using the LASTZ plugin (Harris 2007) of Geneious v 7.0.4 (Kearse et al. 2012). WebAUGUSTUS (Hoff & Stanke 2013) was used to predict genes using the Fusarium graminearium model (https://doi.org/bioinf.uni-greifswald.de/augustus) and the cDNA data from the F. circinatum genome (Wingfield et al. 2012) as gene evidences.

Phylogenetic analysis

Phylogenetic analysis was conducted using partial sequences of the elongation factor 1-α (ef1-α) and beta-tubulin genes from other species in the Fusarium fujikuroi species complex (Herron et al. 2015), including the genome of F. fracticaudum determined here. All gene sequences were aligned using MAFFT (Katoh et al. 2009). A maximum likelihood phylogenetic analysis was carried out in PhyML v 3.1 (Guindon et al. 2010) using the GTR+I+G substitution model with 1000 bootstraps, as determined using jModelTest v 2.1.7 (Darriba et al. 2012).

Results and Discussion

The assembled genome of F. fracticaudum was 46.29 Mb long with a GC content of 47.6 %. The assembly consisted of 50 scaffolds with an N50 value of 4 491 441 bp. WebAUGUSTUS predicted a total of 14 729 open reading frames (ORFs) in the assembly. Based on the BUSCO results, the assembly was 98.8 % complete (i.e., complete and single-copy BUSCOs = 97.6 %; complete and duplicated BUSCOs = 1.2 %; fragmented BUSCOs = 0.9 %; missing BUSCOs = 0.3 %; number of BUSCOs searched = 3725). The phylogeny inferred using two protein-coding genes also shows the previously reported relationships among the FFSC species included (Fig. 8.) (O’Donnell et al. 2000, Geiser et al. 2005, Herron et al. 2015). The sequences extracted from F. fracticaudum also grouped with those of another isolate (CBS 137233) and the original GenBank accessions for the isolate sequenced here.

Fig. 8
figure 8

A Maximum Likelihood phylogeny showing the placement of the F. fracticaudum isolate (indicated in bold) that was sequenced in this study. The tree was inferred from combined β-tubulin and translation elongation factor 1-α gene sequences (Herron et al. 2015). Values at branch nodes are the bootstrapping confidence values with those ≥ 85% shown. The scale bar indicates substitution per site.

In terms of overall genome statistics, the whole genome sequence of F. fracticaudum is similar to those reported for F. pininemorale, F. circinatum, and F. temperatum (Table 3). Also, F. fracticaudum contained the reciprocal translocation between chromosome 8 and 11 known in these fungi (De Vos et al. 2014). However, sequence comparisons showed that chromosome 12, which is dispensable in other members of the FFSC (Xu et al. 1995), is 1 094 708 bp in size in F. fracticaudum. This is considerably larger than the 692 922 bp reported for F. fujikuroi (Wiemann et al. 2013), 986 231 bp in F. temperatum (Wingfield et al. 2015b), 791 442 bp in F. nygamai (Wingfield et al. 2015a) and 968 722 bp reported in F. pininemorale (Wingfield et al. 2017). The differences observed in these genomes highlight the importance of sequencing the genomes of additional species in the FFSC. The F. fracticaudum sequenced here, together with those of other FFSC species will undoubtedly provide a platform to answer numerous questions pertaining to the evolutionary history of these fungi and their species-specific traits.

Table 3 Genome statistics for F. fracticaudum and its close relatives.

Authors: B.S Swalarsk-Parry*, E.T. Steenkamp, S. van Wyk, T.A Duong, B. D. Wingfield, and L. De Vos

*Contact: benny.swalarsk@fabi.up.ac.za

IMA Genome-F 9I

Draft genome sequence of Phialophora cf. hyalina BP5553, a fungus that produces the sulfated pneumocandin FR190293

Introduction

Researchers at Fujisawa Pharmaceuticals (now Astellas Pharma) isolated strain BP5553 (Fujisawa No. 16616) from soil collected in Japan and identified it as Tolypocladium parasiticum. They fermented the strain to produce the water-soluble echinocandin analogue FR190293 (Fig. 2). Like other echinocandins, FR190293 strongly inhibited β-1,3-glucan synthase and exhibited potent in vitro activity against Candida. albicans and Aspergillus fumigatus. The discovery of this new echinocandin variant was significant because it is the first of the echinocandins to have a dimethyl myristic acid acyl side chain, as in the pneumocandins, in combination with a sulfated homotyrosine residue.

As previously reported, in-depth phylogenetic and morphological analysis of BP5553 demonstrated that the identification as the rotifer parasite T. parasiticum (syn. Pochonia parasitica) was erroneous. Rather than belonging to Clavicipitaceae, BP5553 was found to belong in Helotiales (Yue et al. 2015). Based on rDNA and other protein-encoding sequences, BP5553 falls within a monophyletic lineage along with ex-type strains of Phialophora hyalina, Pleuroascus nicholsonii, Scopulariopsis parva, and Scopulariopsis parvula (Fig. 7). These strains, along with other species with Phialophora-like conidial morphs in Helotiales and BP5553 will eventually comprise a new genus in a new family of Helotiales (W. Untereiner et al., unpubl.).

The primary objective behind the sequencing the genome of Phialophora cf. hyalina was the identification of the gene cluster-encoding the biosynthesis of FR190293 (Yue et al. 2015). This draft genome will expand genomic data sets for comparative genomics of species in Leotiomycetes and Helotiales.

Sequenced Strain

Japan: Fukushima Prefecture: Iwaki, isolated from soil, [no further information] (NBRC-NITE BP5553, Fujisawa No. 16616).

Nucleotide Sequence Accession Number

The Phialophora cf. hyalina isolate BP5553 Whole Genome Shotgun project has been deposited in GenBank under the accession number NPIC00000000.

Materials and Methods

The methods for DNA extraction, sequencing, and genome assembly were essentially the same as for strain BP6252 above.

Results and Discussion

The genome of BP5553 was sequenced to 102-fold coverage, yielding 32 scaffolds with N50 of 3.8 Mb. The assembled genome size was 33.6 Mb, and a total of 10 707 genes were predicted. The GC content of this genome is 48.2 %. The genome contains 45 core catalytic genes associated with putative secondary metabolite biosynthetic gene clusters. These clusters include 19 PKSs, 13 NRPSs, six PKS-NRPS hybrids, two dimethylallyl tryptophan synthases, four terpene synthases, and one chalcone synthase. These genes are distributed among 40 putative biosynthetic gene clusters that also include genes encoding tailoring enzymes, regulators, transporters, and other auxiliary genes. In addition to these gene clusters, eight secondary metabolite gene clusters containing PKS-like or NRPS-like enzyme genes, or other secondary metabolic-related genes were identified by antiSMASH. In addition a gene cluster containing close orthologues of the pneumocandin gene cluster from Glarea lozoyensis (Yue et al. 2015) was recognized, and predicted to be responsible for the biosynthesis of FR190293.

Authors: Q. Yue, Y. Li, L. Chen, X. Zhang, K. Li, J. Sun, X. Liu Z. An, and G.F Bills*

*Contact: billsge@vt.edu

IMA Genome-F 9J

Draft genome sequence of Morchella septimelata

Introduction

Morels (Morchella spp., Ascomycota) are a highly desired group of edible fungi with a worldwide distribution (Du et al. 2012, Kanwal et al. 2011, Richard et al. 2015). They have been collected by mycophiles and gourmets for hundreds of years for their delicate taste and unique appearance (Tietel & Masaphy 2017, 2018, Rotzoll et al. 2006). Morels are also found containing a variety of secondary metabolites with medicinal properties (Tietel & Masaphy 2018, Shameem et al. 2017, Pfab et al. 2008, Vieira et al. 2016). Morchella septimelata is a black morel, belonging to the Morchella elata clade (Kuo et al. 2012). It was often found in lightly to moderately burned conifer forests, near creek beds, springs and seeps, at an altitude of 1000–2000 m (Pildain et al. 2014). The ascomata of M. septimelata can be found primarily in years immediately following forest fires, and then often appearing in dwindling numbers for several seasons thereafter (Hobbie et al. 2016). In recent years, artificial cultivation of true morels has made great progress (Masaphy 2010); several Morchella species, such as M. sextelata, M. septimelata, and M. importuna, have been successfully cultivated in China, America, and other parts of the world. However, the mechanism of the growth and development of Morchella remains unclear, which causes the frequent failure and unstable yield of Morchella cultivation (Liu et al. 2018).

The genome sequence of M. septimelata from this study may reveal the mechanism of secondary metabolites synthesis in M. septimelata and provide some insights into the growth, development, and carbohydrate degradation of M. septimelata.

Sequenced Strain

China: Sichuan: Liangshan Yi, N 27°49′ E 100°48′, alt. 1468 m, isolated from forest soil, 19 Sep. 2015, Chuan Xiong & Qiang Li (MG91-dried culture).

Nucleotide Sequence Accession Number

The Whole Genome Shotgun project M. septimelata isolate (Culture collection number SAAS91) has been deposited at DBJ/EMBL/GenBank under the accession number PYSJ00000000. The version described in this paper is version PYSJ01000000.

Materials and Methods

Morchella septimelata MG91 was isolated from forest soil in Liangshan Yi Autonomous Prefecture, Sichuan, China, and was preserved in the Fungal Culture Collection Center of Biotechnology and Nuclear Technology Research Institute (Chengdu, Sichuan). Genomic DNA was extracted from MG91 and subjected to sequencing on the Genome Analyzer IIx next-generation sequencing platform (Illumina) at the BGI (Shenzhen, China). Paired-end libraries with respective insert sizes of 425 bp and 725 bp were used to generate read lengths of 150 bases. The CLC Genomics Workbench v. 6.0.1 (CLCBio, Aarhus, Denmark) was subsequently used to trim reads of poor quality (limit of 0.05) as well as terminal nucleotides. The remaining reads were assembled using the SPAdes 3.0.0 with an optimized k-mer value of 21 (Bankevich et al. 2012). Thereafter, scaffolding was completed using SSPACE v. 2.0 (Boetzer et al. 2011) and gaps reduced with the use of GapFiller v. 2.2.1 (Boetzer & Pirovano 2012). The completeness of the assembly was evaluated using the BUSCO v3 (Simão et al. 2015).

Homology-based gene prediction and ab initio prediction were performed to search M. septimelata gene models. Homologous protein from Tuber melanosporum was used for alignment to the repeat-masked M. septimelata genome using Exonerate v 2.2.0 (Slater & Birney 2005). The filtered alignment results (above 300 bp and 90 % coverage) were built as training models for ab initio gene prediction. The ab initio prediction was conducted using Augustus v. 3.2.3 (Stanke et al. 2008) and GeneMark-ES (Ter-Hovhannisyan et al. 2008) guided by training models from homology-based alignments. All gene prediction results were integrated into the final gene models by EVidenceModeler (Haas et al. 2008). Carbohydrate-active enzymes (CAZyme), including the repertoire of auxiliary enzymes, were predicted using dbCAN (Yin et al. 2012).

To verify the species identity of the sequenced strain, the Translation Elongation Factor 1-alpha gene for selected Morchella species (Fig. 9.) were aligned with mafft (Katoh & Standley 2013). The Bayesian inference (BI) method (Erixon et al. 2003) was used to construct the phylogenetic tree of different Morchella species. JMODELTEST 2.0.2 was used to ascertain the best-fit model for nucleotide substitutions (Darriba et al. 2012). BI analysis was performed with MrBayes v3.2.6 (Ronquist et al. 2012). Two independent runs with four chains (three heated and one cold) each were conducted simultaneously for 2 × 106 generations. Each run was sampled every 100 generations. We assumed that stationarity had been reached when estimated sample size (ESS) was greater than 100, and the potential scale reduction factor (PSRF) approached 1.0. The first 25 % samples were discarded as burn-in, and the remaining trees were used to calculate Bayesian posterior probabilities (BPP) in a 50 % majority-rule consensus tree.

Fig. 9
figure 9

A Bayesian inference (BI) phylogenetic analysis of genus Morchella using MrBayes v3.2.6 based on partial gene sequences of elongation factor 1-alpha (EF1-α) gene. Posterior probabilities are shown on the nodes of the tree. The Morchella septimelata isolate used for verification was extracted from the assembled genomes. Reference sequences are obtained from the NCBI database with accession number.

Results and Discussion

The genome of M. septimelata had an estimated size of 49.81 Mb with an average coverage of 151.17 times (Table 4). The Scaffold N50 size was 37 734 bases, and the assembly had a mean GC content of 47.40 %. The total number of scaffold generated was 6525. A total of 11 427 genes were predicted with an average length of 1 571 bp. A phylogenetic analysis of the genus Morchella is provided to show position of M. septimelata (Fig. 9).

Table 4 Genome statistics, CAZYme richness and secondary metabolite clusters for the Morchella septimelata MG91 genome sequence.

The draft genome of M. septimelata is larger than that of the closely related species, M. conica CCBAS932 (JGI: 1023999) and M. importuna SCYDJ1-A1 (JGI: 1047732), which are 48.21 Mb and 48.80 Mb, respectively. Less gene models are found in M. septimelata compared to that of these closely related species, M. conica CCBAS932 (11 600 gene models) and M. importuna SCYDJ1-A1 (11971 gene models). The average gene length of M. septimelata is also smaller than that of M. conica CCBAS932 (1668 bp) and M. importuna SCYDJ1-A1 (1625 bp). The average gene density of M. septimelata was 229 genes/Mb, which is smaller than that of M. conica CCBAS932 (240 genes/Mb) and M. importuna SCYDJ1-A1 (245 genes/Mb).

A total 512 CAZymes were identified in the genome of M. septimelata, which is more than that of the closely related species, M. conica CCBAS932 (401 CAZymes) and M. importuna SCYDJ1-A1 (403 CAZymes), indicating that the carbohydrate degradation ability of M. septimelata may be stronger than that of the other two closely related species. A total of 9 secondary metabolite (sM) clusters were found in the M. septimelata genome, of which 3 sM clusters were for terpenes. The genome sequence data of M. septimelata presented in this study will provide useful information for understanding the synthesis mechanism of secondary metabolites in M. septimelata and lay a foundation for the artificial cultivation of M. septimelata.

Authors: Q. Li, C. Xiong, X. Ma, H. Li, Y. Gao, Y. Dong*, and W. Huang*

*Contact: loyalyang@163.com; wenlih11@126.com