IMA Genome-F 10A

Nine draft genome sequences of Claviceps purpurea s.lat., including C. arundinis, C. humidiphila and C. cf. spartinae

Introduction

Claviceps purpurea (Clavicipitaceae, Hypocreales) is a plant pathogen that infects the flowers of cereal crops and grasses (Poaceae) causing ergot disease. After floral infection by the pathogen, the seeds of grass hosts are replaced with hard, dark fungal resting bodies called sclerotia or ergots. Consumption of grains contaminated with ergots is harmful to human and animal health, causing ergotism, the result of a spectrum of potent mycotoxins known as ergot alkaloids (Lyons et al. 1986, Miles et al. 1996, Scott 2009). These alkaloids have caused significant health, social and economic concerns at different times in history (Fuller 1968, Caporael 1976, Miles et al. 1996, De Groot et al. 1998, Alm 2003), but are also powerful pharmaceuticals for treating various medical conditions (De Groot et al. 1998, Crosignani 2006, Micale et al. 2006). Understanding the genetic diversity of species, their correlated toxin profiles and molecular backgrounds is important for the agricultural and pharmaceutical sectors, and regulatory agencies.

Intraspecific variations in morphology, alkaloid chemistry, genetics, and ecological niches have revealed the existence of several subgroups (Pazoutovä et al. 2000) in C. purpurea s. lat. These groups were later identified as cryptic species based on multi-gene phylogenetic and population genetic analyses, i.e. C. arundinis, C. humidiphila, and C. spartinae (Douhan et al. 2008, Pažoutovä et al. 2015) and recently a few more species from South Africa were described as a part of the species complex (Van der Linde et al. 2016). In Canada, the incidence of ergot diseases in Alberta, Saskatchewan and Manitoba has been increasing since 2002 (Menzies & Turkington 2014). During a recent investigation of ergot fungi in agriculture areas in Canada, we discovered a few more new phylogenetic lineages: two closely related but different from C. spartinae (G3), one close to C. humidiphila, and another two located as basal branches in the C. purpurea s. lat. complex (Fig. 1). Here, we selected representatives of these new and previously designated lineages, sequenced and assembled their genomes. A complete multigene phylogenetic analysis, including a much greater sampling of strains, will be presented in a separate publication.

Fig. 1
figure 1

One of the two MP trees showing nine strains (in bold) in relation to Claviceps lineages based on EF1-α partial region, 99 informative characters, length = 302, CI = 0.606, RI = 0.807, RC = 0.489, HI = 0.394, G-fit = −75.089. Values on branches are MP bootstrapping/BI posterior probability.

Sequenced Strains

Claviceps purpurea s.str.

Canada: Saskatchewan: Estavan SK1 49.14 N 102.99 W, isolated from Triticum aestivum, 2000, R. Clear [identified by J. G. Menzies] (LM28 = DAOMC 250647). Czech Republic: Bezdědice, 49.83 N 14.03 E, isolated from Secale cereale, 2003 [identified by S. Pažoutovä] (LM582 = DAOMC 251723 = CCC771 ex-neotype).

Claviceps purpurea s.lat.

Canada: Alberta: North Star 58.53 N 118.12 W, isolated from Bromus inermis, 7 Sep. 1956 [identified by W. P. Campbell] (LM78 = DAOMC 250578); Metiskow, 52.41 N 110.63 W, isolated from Elymus albicans, 7 Sep. 1956 [identified by W. P. Campbell] (LM81 = DAOMC 250581). Quebec: Cote Nord, MRC Minganie, Pointe-Parent, 50.13 N, 61.08 W, isolated from Ammophila sp., 8 Sep. 2015, J. Cayouette & Y. Dalpé (LM458 = DAOMC 251898).

Claviceps cf. spartinae

Canada: Manitoba: Grant’s Field Snowflake 49.05 N 98.66 W, isolated from Phalaris arudinacea, 2014, J. Menzies [identified by M. Liu] (LM218 = DAOMC 251843). Quebec: Maria-Chapdelaine, parc national de la Pointe-Taillon, river du Lac Saint-Jean, 48.67 N 71.87 W, isolated from Ammophila breviligulata, 31 Aug. 2014, J. Cayouette [identified by M. Liu] (LM454 = DAOMC 251845 = DAOM 550246b-specimen).

Claviceps humidiphila

Germany: Bavaria: near Phillipsreuth, 48.86 N 13.68 E, isolated from Dactylis sp., 1998 and S. Pažoutovä (LM576 = DAOMC 251717 = CCC434 ex-epitype).

Claviceps arundinis

Czech Republic: Haklovy Dvory, Stary Vrbensky pond, 49.01 N 14.43 E, isolated from Phragmites australis, 11 Jan. 2008, M. Kolařík (LM583 = DAOMC 251724 = CCC933 ex-type).

Nucleotide Sequence Accession Numbers

The genome data for the nine strains were deposited at DDBJ/EMBL/GenBank under BioProject PRJNA449361. The versions described in this paper are version QERD01000000, QEQW01000000, QEQX01000000, QEQY01000000, QEQZ01000000, QERA01000000, QERB01000000, QERC01000000, QERE01000000. Raw reads were deposited in NCBI SRA (http://www.ncbi.nlm.nih.gov/sra) underaccession numberSRP139610.

Materials and Methods

Fungal mycelia and spores were inoculated onto Potato Dextrose Agar (PDA; BD Difco™) by streaking the medium surface in Petri dishes using inoculation loops, to allow the fungus to cover the dish in a short time and incubated for 10 d at 20 °C in the dark. Mycelia of 1–2 dishes were harvested and ground using liquid N2, followed by genomic DNA extraction using a modified cetyltrimethyl ammonium bromide (CTAB) protocol (Doyle & Doyle 1987). The protocol was modified to remove RNA by adding extra RNase and incubating for longer time as follows: (1) after homogenization of ground fungal tissues with 700 µl 2X CTAB, 140 µl RNase Cocktail™ Enzyme Mix (4x recommended volume ratio: 2.5 µl / 50 µl sample; Invitrogen by Thermo Fisher Scientific), and 28 µl RNASE A(4x recommended volume ratio: 1 µl/100 µl sample) was added, and digested for 6–7 h at room temperature. (2) Afterwards, 10 µl Proteinase K (50 µg/µL stock; Fisher Scientific by Thermo Fisher Scientific) was added to the suspension and incubated for 2 h at 55 °C. (3) Before DNA precipitation, another 35ul RNase Cocktail (recommended volume ratio without extra) was added into the supernatant, incubated for 45 min—1 h at room temperature, followed by adding 650 µL CHCl4 and centrifuging at 13 000 rpm for 15 min. Potential RNA contamination in the DNA samples was checked by running the samples on 1 % agarose gels, and determining the 260/280 ratio using NanoDrop 1000 Spectrophotometer v3.8 (Thermo Fisher Scientific) to ensure no RNA was present. gDNA was quantified using Qubit® 2.0 Fluorometer (Invitrogen by Life Technologies).

The extracted gDNAs were normalized to 400 ng and sheared to a 300 bp insert using a Covaris LE220 instrument. The fragmented inserts were used as a template to construct PCR free Libraries with a NxSeq AmpFREE Low DNA Library kit (Lucigen) following the manufacturer’s instructions. Indexed libraries were pooled into two individual pools and sequencing runs were carried on a NextSeq (Illumina; Molecular Technologies Laboratory, Ottawa Research & Development Centre, Agriculture and Agri-Food Canada) using 2x150 bp NextSeq Mid Output Reagent Kits (Illumina).

The FastQC software v0.11.5 (https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) was used to assess the raw read quality. Poor quality data was removed with Trimmomatic v0.36 (Bolger et al. 2014) using the following parameters: HEADCROP:20 SLIDINGWINDOW:4:20 MINLEN:36. The trimmed reads were error corrected with BayesHammer (Nikolenko et al. 2013). De novo genome assembly was performed using SPAdes v3.10.1 (Bankevich et al. 2012) with the mismatch correction step enabled. Contigs shorter than 1000 bp were discarded. QUAST v4.5 program (Gurevich et al. 2013) was used to evaluate assemblies. Corrected reads were mapped back onto the contigs using Bowtie2 v2.0.0 (Langmead & Salzberg 2012). Alignments produced by Bowtie2 in SAM format were converted to sorted BAM format by SAMtools v0.1.19 (Li et al. 2009) and statistics for nucleotide coverage were generated with Qualimap v2.2 (Garcia-Alcalde et al. 2012). To evaluate the completeness of our genome assemblies Universal Single-Copy Orthologs, BUSCO v2 (Simão et al. 2015) was run on the contigs using the fungal database (obd9). Genome annotation was carried out using GeneMark-ES v4.38 (Lomsadze et al. 2005) with the “fungus” option enabled (Ter-Hovhannisyan et al. 2008). Annotations were validated using Genome Annotation Generator (Hall et al. 2014) and tbl2asn (http://www.ncbi.nlm.nih.gov/genbank/tbl2asn2/). All statistics are summarized in Table 1.

Table 1 Statistics of Claviceps genomes sequenced in this study.

To confirm the identities of the strains, DNA sequences of partial elongation factor 1-α gene were extracted from each assembly of nine genomes using Geneious 10.0.9 (Kearse et al. 2012); the fragments were aligned with representative sequences developed by Pažoutovä et al. (2015) and Van der Linde et al. (2016) using MAFFT version 7 (https://mafft.cbrc.jp/alignment/server/) Katoh & Standley 2013. Maximum Parsimony (MP) analyses were performed using PAUP* 4.0b10 (Swofford 2002) with heuristic search of 200 replicates random stepwise addition, bootstrapping of 1000 replicates. Bayesian Inference (Bl) was conducted using Mr Bayes 3.2 (Ronquist et al. 2012), with two independent runs, each sets four chains of 100 000 000 MCMC generation, and 25 % burn-in.

Results and Discussion

The strains sequenced include: two strains of Claviceps purpurea s.str. and C. cf. spartinae, one strain each of C. humidiphila and C. arundinis, and three strains of C. purpurea s.lat. (Table 1, Fig. 1). The genomes were assembled into 1423 to 2321 contigs with a mean assembly size 30.8 Mb ranging from 28.6 Mb to 35.9 Mb. The average GC content was 51.5 %. The N50 ranged from 21.6 kb to 46.6 kb. The assemblies had a BUSCO completeness score ranging from 96.9 % to 98.6 %; the number of gene models ranged from 8410to 9230.

The genome of Claviceps purpurea strain 20.1 (NCBI accession no. CAGA00000000.1) was previously published by (Schardl et al. 2013). They obtained a similar number of contigs (1442), number of gene models (8823), total assembly size (30.9 Mb), N50 (46.5 kb) and GC content (51.6 %), compared to our assemblies, suggesting we obtained reasonable assemblies. However, they additionally ordered contigs into only 191 scaffolds and performed genome annotation. We ran BUSCO on their assembly and obtained a completeness score of 97.6 %. Our aim is to use these sequenced Claviceps genomes to further understand the species diversity, genetic variation, and establish correlations with ergot alkaloid chemical profiles.

IMA Genome-F 10B

Assembling pseudomolecules for the pitch canker pathogen, Fusarium circinatum, utilising additional genome sequence data and synteny within the Fusarium fujikuroi complex

Introduction

Fusarium includes a diverse group of filamentous ascomycetes (Geiser et al. 2013). Many of these fungi cause diseases on economically important plants, with an estimated 80 % of cultivated crops having at least one associated Fusarium disease (Leslie & Summerell 2006). Within the Fusarium fujikuroi species complex (FFSC), more than 50 phylogenetically distinct species have been grouped into three biogeographical clades (O’Donnell et al. 1998, 2000). Fusarium circinatum, residing in the so-called “American clade”, is the causal agent of the disease known as pine pitch canker that damages susceptible Pinus spp. and Pseudotsuga menziesii (Douglas Fir). It has a cosmopolitan distribution and is associated with significant economic losses due to widespread seedling mortality in nurseries as well as the reduction of growth in mature trees due to dieback of infected branches (Gordon et al. 2015).

The importance of this pathogen has justified sequencing its genome (GenBank accession AYJV00000000, version AYJV01000000) (Wingfield et al. 2012). The isolate sequenced, FSP34 (Gordon et al. 1996), has a genetic linkage map available (De Vos et al. 2007) which has been anchored to the genome (De Vos et al. 2014) enabling localization of quantitative trait loci (QTLs) to the genomic sequence data (De Vos et al. 2011, Van Wyk et al. 2018). The draft assembly was 94.8 % complete (Waterhouse et al. 2017), but it included an exorbitant number of contigs (4145) (Wingfield et al. 2012, De Vos et al. 2014) and this limits its use in comparative genomic studies.

The whole genome sequences of other members of the FFSC are available and their genome complement is present in chromosomes. These include Fusarium verticillioides for which the sequences for only 11 chromosomes are available (Ma et al. 2010). This F. verticillioides assembly excludes that for the twelfth and smallest chromosome known to exist in members of the FFSC. This is due to the dispensable nature of this chromosome, with it being strain-specific within the FFSC (Xu et al. 1995, Ma et al. 2010, Wiemann et al. 2013, Van der Nest et al. 2014). In contrast, the whole molecule sequences have been determined for the full complement of the twelve chromosomes for F. fujikuroi (Wiemann et al. 2013). These two species represent two of the three biogeographical clades of the FFSC. Comparisons among them and F. temperatum have shown a significant level of macrosynteny at the genomic sequence level (Wiemann et al. 2013, De Vos et al. 2014). This highlights the fact that the genomic content on chromosomes is highly conserved between various species in the FFSC.

The aim of this study was to improve the available draft assembly of F. circinatum, and assemble it into pseudomolecules that are comparable with the chromosomes of other members of the FFSC. For this purpose, we utilized additional genome sequence information (i.e. mate-pair sequence data) to allow for the scaffolding of contigs. We then exploited the macrosynteny that characterizes the genomes of species within FFSC (Wiemann et al. 2013, De Vos et al. 2014), to orientate and order these scaffolds into twelve pseudomolecules. In this study we present the pseudomolecule assemblies for the full chromosomal complement of F. circinatum. This improved genome assembly will aid in genome-sequence based studies, particularly those involving chromosomal comparisons. Addition of the F. circinatum pseudomolecule complement will furthermore enable comparative studies focusing on genomic synteny and architecture between the three biogeographic clades within the FFSC, as well as more broadly in the genus Fusarium.

Sequenced Strain

USA: California: isol. Monterey pine (Pinus radiata), 1996, T.R. Gordon, A.J. Storer & D. Okamoto (FSP34, MRC7870, CMW51752, PREM 62197-dried culture).

Nucleotide Sequence Accession Number

The improved genome assembly for Fusarium circinatum, generated in this study, has been deposited at DDBJ/EMBL/GenBank under the accession AYJV00000000, version AYJV02000000.

Materials and Methods

Fusarium circinatum was grown on half strength potato dextrose broth (20 % potato dextrose broth w/v) and incubated at 25 °C in the dark on an orbital shaker at 120 rpm. After 7 d, DNA was extracted following the protocol outlined (Möller et al. 1992) and the DNA quality was assessed using a NanoDrop™ Spectrophotometer.

Additional genomic sequence data from F. circinatum isolate FSP34 and a second isolate, KS17, were utilised for scaffolding the original 4145 contig assembly (Wingfield et al. 2012). Isolate KS17 was cultured from infected root tissue of P. radiata nursery seedlings collected from the Western Cape, South Africa in 2005 (Steenkamp et al. 2014). The genomes of F. circinatum isolates FSP34 and KS17 were sequenced using mate-pair libraries (1 kb insert size) by making use of the SOLiD™ V4 technology (Applied Biosystems) at Secqomics (Hungary). In total, 82.45 and 153.95 million mate-pair reads were obtained for the respective isolates. Poor quality reads (below Q20), reads smaller than 36 bp and duplicate reads were removed in CLC Genomics Workbench v.5.1 (CLCbio, Aarhus).

SSPACE v. 2.0 (Boetzer et al. 2011) was utilized to scaffold the pre-assembled contigs of the FSP34 assembly (GenBank accession no. AYJV00000000) (Wingfield et al. 2012), using the trimmed mate-pair data. Default parameters were used, but the minimum number of paired reads linking contigs to form a scaffold was set to 200. The average genome coverage was calculated using the Lander/Waterman equation (number of reads×read length/genome size).

The resulting scaffolds were then assembled into 11 contiguous pseudomolecules (representing the first chromosome 1–11) using F. verticillioides as a reference genome. The scaffolds were ordered and orientated based on BLAST searches (Altschul et al. 1990) against a local database of the F. verticillioides genome (DDBJ/EMBL/GenBank accession number AAIM00000000.2) using CLC Genomics Workbench. To assemble pseudomolecule 12, scaffolds were ordered and orientated to chromosome 12 of F. fujikuroi (Wiemann et al. 2013) and F. temperatum (Wingfield et al. 2015b), as described above. To indicate a break between the various scaffolds comprising a pseudomolecule, 100 N’s were inserted. Synteny maps were generated between the chromosomes of F. verticillioides and F. fujikuroi and the pseudomolecules of F. circinatum using the program MUMmer v. 3.22 (Kurtz et al. 2004).

The assembled genome was annotated using the MAKER annotation pipeline (Cantarel et al. 2008) utilizing Genemark ES (Ter-Hovhannisyan et al. 2008), Augustus (Stanke & Morgenstern 2005), and SNAP (Korf 2004). Manual curation of the predicted annotations was also performed (Wingfield et al. 2012). As additional evidence, genome data from F verticillioides, Fusarium oxysporum f. sp. lycopersici and F. graminearum (Ma et al. 2010), as well as expressed sequence tag (EST) evidence for F. circinatum (Wingfield et al. 2012) were included.

Results and Discussion

The improved genome assembly for Fusarum circinatum, generated in this study, has been deposited at DDBJ/EMBL/GenBank under accession no. AYJV00000000, version AYJV02000000. The F. circinatum genome was assembled into 585 scaffolds that cumulatively comprised 43 932 912 bases of DNA and had a N50 of 363 633bp. The genome coverage was 273.82x. A GC content of 47.41 % was obtained, which is comparable to other sequenced Fusarium species within the FFSC (Ma et al. 2010, Wingfield et al. 2012, Jeong et al. 2013, Wiemann et al. 2013, Van der Nest et al. 2014, Chiara et al. 2015, Wingfield et al. 2015a, b, Niehaus et al. 2017a, b, Wingfield et al. 2017, Gardiner 2018, Srivastava et al. 2018, Van Wyk et al. 2018, Wingfield et al. 2018). A total of 14 923 genes were predicted to be protein-coding, yielding a gene density of 339.68 open reading frames (orfs) per million base pairs. Phylogenetic analysis of the sequenced genome confirmed the taxonomic identity as F. circinatum (Fig. 2).

Fig. 2
figure 2

Maximum likelihood tree based on partial gene sequences of β-tubulin and translation elongation factor 1-α (Scauflaire et al. 2011, De Vos et al. 2014). Sequence alignments were assembled with MAFFT version 7 (Katoh & Standley 2013). The program jModelTest v 2.1.7 (Guindon & Gascuel 2003, Darribo et al. 2012) was used to determine the best-fit substitution model (TIM2+I+G substitution model) with gamma correction (Tavare 1986). A maximum likelihood (ML) phylogenetic analysis was performed using PhyML v 3.1 (Guindon et al. 2010). Values at branch nodes are the bootstrapping confidence values with those ≥ 85 % shown. The F. circinatum FSP34 isolate used in this study is indicated in bold.

Pseudomolecules, corresponding to each of the 11 chromosomes of F. verticillioides, were constructed in this study. Pseudomolecule 12 was assembled according to synteny observed with chromosome 12 of F. fujikuroi. We managed to genetically anchor 96.97 % (ca. 42.60 Mb) of the F. circinatum scaffolds to these 12 pseudomolecules. These pseudomolecules harbour 99.09 % of the 15 060 genes originally predicted for F. circinatum (Wingfield et al. 2012). Genomic alignments of these 12 pseudomolecules to the corresponding F. verticillioides and F. fujikuroi chromosomes are shown in Fig. 3. These dot-blots are indicative of the observable macrosynteny of Fusarium species within the FFSC.

Fig. 3
figure 3

Whole genome comparisons of: A. Fusarium verticillioides chromosomes to F. circinatum pseudomolecules. B. F. fujikuroi chromosomes to F. circinatum pseudomolecules. In the dotplot alignments forward matches are indicated by purple dots, reverse matches with blue dots.

Conclusions

Resequencing has provided a vastly improved assembly for Fusarium circinatum characterized by fewer and significantly larger scaffolds. By making use of the macrosynteny known to characterize the genomes of FFSC species (Wiemann et al. 2013, De Vos et al. 2014), we were able to join these scaffolds into 12 pseudomolecules that represent the haploid chromosome number of this fungus.

This new assembly of the F. circinatum genome provides a considerably more robust and complete representation of the whole genome sequence of the pathogen, including pseudomolecules that correspond to the twelve chromosomes of F. circinatum. The availability of the sequence for these pseudomolecules will enable future comparative studies at the chromosomal level between/within the three biogeographic clades of the FFSC. In addition, comparisons of chromosomal architecture will expand our knowledge regarding the genomes of species in the FFSC and highlight inter- and intraspecific similarities and differences between them. This would broaden available knowledge regarding the evolution of an important group of plant pathogens.

IMA Genome-F 10C

Draft genome sequence of Quambalaria eucalypti

Introduction

The genus Quambalaria(Quambalariaceae, Microstromatales, Exobasidiomycetes, Basidiomycotina) includes mainly leaf and shoot pathogens of trees belonging to Myrtaceae (De Beer et al. 2006, Pegg et al. 2009). The exceptions to this are Q. coyrecup which is a canker pathogen of marri (Corymbia calophylla) in Western Australia (Paap et al. 2008), and Q. cyanescens which is an opportunistic pathogen of primarily immunocompromised or debilitated humans (Kuan et al. 2015). The latter species is also frequently isolated from galleries of bark beetles infesting hardwoods, although its ecological role in these ecosystems remains enigmatic (Kolarík et al. 2006). Of the leaf and shoot pathogens in the genus, Q. pitereka and Q. eucalypti are most important. Quambalaria pitereka affects Corymbia species in Australia (Pegg et al. 2009) and China (Zhou et al. 2008), while Q. eucalypti causes disease on Eucalyptus spp. in South Africa (Wingfield et al. 1993), Brazil (Alfenas et al. 2001), Uruguay (Perez et al. 2008), Australia (Pegg et al. 2008), and Portugal (Bragança et al. 2015).

Research on species of Quambalariaceae has mostly focussed on their classification and taxonomy (De Beer et al. 2006, Kijpornyongpan and Aime 2017, Paap et al. 2008), as well as their pathogenicity and impact on tree health (Pegg et al. 2009, 2011). However, little is understood about the life-cycle and general biology of these fungi, that are related to the smut fungi and human pathogenic members of Malasseziales (De Beer et al. 2006, Wang et al. 2015b). For example, it is not yet known whether Quambalariaceae have the ability to reproduce sexually, and if so, what the mating system of these fungi encompasses.

Recently the first genome-based studies started exploring pathogenicity factors in Ustilaginomycotina, and although two species in Microstromatales were included in the comparative analyses, no representative of Quambalariaceae was incorporated (Kijpornyongpan et al. 2018). Whole genome sequences have also been shown to be extremely valuable to study mating systems in smut fungi (Que et al. 2014), the Malasseziales (Xu et al. 2007), and the more distantly related Polyporales (James et al. 2013).

The aim of this study was to produce a draft genome sequence of Q. eucalypti. This genome sequence will allow for the exploration and comparative analyses of genes involved in pathogenicity and mating for this pathogen. Here we report the draft genome sequence of isolate CMW1101, an isolate representing the holotype (PREM 51089) of Q. eucalypti.

Sequenced Strain

South Africa: KwaZulu Natal: Kwambonambi, Eucalyptus grandis clone TAG12, May 1987, M.J. Wingfield (CMW1101=CBS118844 ex-holotype isolate; PREM 51089—holotype).

Nucleotide Accession Number

The draft genome sequence of Quambalaria eucalypti has been deposited at GenBank under accession no. PlRRYC00000000. The version presented here is RRYC01000000.

Materials and Methods

Genomic DNA was extracted from cultures grown on Malt Yeast Agar (2 % Malt extract, 0.5 % yeast extract; 2 % agar, Biolab, Midrand, South Africa) using the method described by Duong et al. (2013). The genomic DNA was sent to Macrogen (South Korea), where one pair-end library with 500 bp insert size was prepared and sequenced on Illumina Hiseq 2500 to get 250 bp pair-end reads, aiming for 100 X coverage.

The raw sequencing reads were imported into CLC Genomics Workbench v. 7.5.1 (CLCbio, Aarhus), and default settings were used to both trim the reads for quality and to produce a de novo genome assembly using the trimmed reads. The completeness of the assembly was evaluated using the Benchmarking Universal Single-Copy Orthologs (BUSCO v. 1.1b1) tool using the Basidiomycota dataset (Simao et al. 2015). The number of protein coding genes was determined using Augustus v. 3.3.2 (Stanke et al. 2008) using pre-optimised species models for Ustilago maydis.

Results and Discussion

The paired end sequencing yielded just over 31 million reads. Assembly of the trimmed reads resulted in 966 contigs, with the largest contig being 225 583 bp, the smallest contig being 449 bp, with an average contig size of 24 384 bp and the N50 value was 62 600 bp. The genome size is estimated at around 23.5 Mb, estimated through the sum of the contig sizes, with a GC content of 60 %. This estimated size is in the larger size range of that reported in Exobasidiomycetes, which typically range from 17 Mb to 19 Mb with the exception of Tilletia caries with a genome size of 29.5 Mb (Konishi 2013, Saika et al. 2014, Toome et al. 2014, Wang et al. 2015a, Kijpornyongpan et al. 2018). BUSCO analysis indicated an assembly completeness of 84.5 %. The assembly contained 1128 complete BUSCOs (1093 complete single- copy BUSCOs, 35 complete and duplicated BUSCOs), 129 fragmented BUSCOs and 78 missing BUSCOs out of a total 1335 BUSCO groups searched. AUGUSTUS predicted 7241 putative protein coding regions. Phylogenetic analysis of sequences from the sequenced genome confirmed the taxonomic identity as Q. eucalypti (Fig. 4). The availability of the Q. eucalypti genome will enable the inclusion of this species as representative for the family Quambalariaceae in comparative studies with other members of the class Exobasidiomycetes. Such studies could focus on topics like the factors involved in pathogenicity, mating, evolution and more.

Fig. 4
figure 4

Phylogram resulting from a ML analyses using RaxML, based on ITS sequences of selected reference sequences representing all species of Quambalaria. The isolate from which the genomic DNA was extracted is indicated in bold type. T = ex-type isolates; NT = Northern Territory; NSW = New South Wales; WA = Western Australia; and QLD = Queensland. Support values at branches resulted from 1000 bootstraps and only values above 75 % are indicated.

IMA Genome-F 10D

Draft genome sequence of the Eucalyptus pathogen Teratosphaeria destructans

Introduction

The genus Teratosphaeria includes numerous economically important tree pathogens of plantation eucalypts in tropical and subtropical areas (Park et al. 2000, Crous et al. 2009, Hunter et al. 2011). The aggressive pathogen T. destructans was initially reported from Indonesia in 1996 (Wingfield et al. 1996), followed by reports from Thailand, Vietnam, East Timor, Laos, China, and, most recently, South Africa (Old et al. 2003, Burgess et al. 2006, Barber et al. 2012, Greyling et al. 2016). It causes leaf, bud and shoot blight disease in one- to three-year-old trees of Eucalyptus camaldulensis, E. grandis and E. urophylla as well as on hybrids of these species (Wingfield et al. 1996, Old et al. 2003, Barber 2004). The rapid spread of T. destructans over large distances has been attributed to the anthropogenic movement of germplasm to establish clonal Eucalyptus nurseries (Andjic et al. 2011).

The discovery of T. destructans on clonal E. grandis × E. urophylla plantations in South Africa, coupled with its reported rapid spread to new areas, makes this pathogen of major concern to the forestry industries of southern African countries (Andjic et al. 2011, Greyling et al. 2016). Similarly, the ability of this pathogen to rapidly invade new areas are also a concern to Australia, where the prospect of T. destructans spreading to native Eucalyptus species could prove catastrophic to Australia’s commercial and natural vegetation (Old et al. 2003). In this study, the genome sequence of a South African isolate of T. destructans is reported. The availability of a complete genome sequence for T. destructans will prove beneficial to studies on the genes and pathways involved in virulence, pathogenicity and sexual reproduction. Such studies will increase our understanding of the biology of this fungus, which is crucial for the development of preventative and control measures for this tree pathogen.

Sequenced Strain

South Africa: Kwa-Zulu Natal: Kwambonambi, isol. Eucalyptus grandis × E. urophylla, Apr. 2015, I. Greyling (CMW 44962, PREM 62207—dried specimen).

Nucleotide Accession Number

The Teratosphaeria destructans Whole Genome Shotgun project has been deposited at DDBJ/ENA/GenBank under accession no. RIBY00000000. The version described in this paperis version RIBY01000000.

Materials and Methods

Teratosphaeria destructans isolate CMW 44962 was obtained from the culture collection (CMW) of the Forestry and Agricultural Biotechnology Institute (FABI) at the University of Pretoria. A single-spore culture was generated (Greyling et al. 2016) and grown on sterile cellophane sheets placed onto MEA+Y plates (malt extract agar; Biolab, South Africa; amended with 0.3 % yeast extract; Oxoid, Basingstoke). After incubation for four weeks at 25 °C, the mycelia were collected and freeze-dried for DNA extraction. Approximately 30 mg of freeze-dried material, three 5 mm glass beads and 13 mg polyvinylpyrrolidone (PVP, Sigma Aldrich, Steinheim) were mixed and powdered in a FastPrep FP120 tissue lyzer (Qbiogene, Carlsbad, CA). Subsequently, 650 µl of CTAB extraction buffer (100 mM Tris-HCl, pH8; 25 mM EDTA; 2 M NaCl; 3.5 % CTAB; 2 % SDS), 2 µl of 500 mg/L spermidine (Sigma Aldrich, Steinheim) and 4.5 % (v/v) β-mercaptoethanol were added and the sample was incubated at 60 °C for 20 min. After cooling to room temperature, 1.5 volumes of chloroform:isoamylalcohol (24:1) was added, the sample was mixed and centrifuged for 15 min at 8100 g. The supernatant was re-extracted with 1.5 volumes of chloroform:isoamylalcohol (24:1) and centrifuged for 15 min at 16 300 g. The supernatant was combined with potassium acetate to a final concentration of 1.5 M. After incubation for 30 min at −20 °C, 1.5 volumes of cold isopropanol were added, followed by a 30 min incubation at 24 °C. Thereafter, the sample was centrifuged for 20 min at 16 300 g to collect the DNA which was washed twice with 70 % ethanol. The dried DNA pellet was dissolved in 50 µl low TE (Tris-EDTA) buffer (Thermo Fisher Scientific, Wilmington, NC).

The genomic DNA sample was submitted to the Central Analytical Facilities (CAF) of Stellenbosch University (Stellenbosch, South Africa) for whole genome sequencing using the Ion GeneStudio S5 Next-Generation Sequencer. An Ion 530 chip with a capacity to generate 12 million reads of 600 bp was prepared as per the manufacturer’s instructions. The resulting single reads were trimmed and assembled using SPAdes v. 3.12.0 (Nurk et al. 2013) with k-values of 21, 33, 55, 77, 99 and 127. The completeness of the genome assembly was evaluated using the Benchmarking Universal Single-Copy Orthologs (BUSCO v. 2.0.1) tool in conjunction with the fungal data set (Simão et al. 2015). Bowtie2 v. 1.1.2 and SAMtools v. 1.5 were used to calculate the average base coverage by mapping the reads back to the genome assembly (Li et al. 2009, Langmead & Salzberg 2012). QUAST v. 5.0.1 (Mikheenko et al. 2018) was used to calculate general genome statistics.

The β-tubulin and elongation factor 1-alpha (EF1-α) gene regions, commonly used for species delineation in Teratosphaeria (Quaedvlieg et al. 2014) were extracted from the T. destructans genome sequence. These sequences, together with previously published sequences (Quaedvlieg et al. 2014, Aylward et al. 2018) sourced from the NCBI database (https://www.ncbi.nlm.nih.gov/) were subjected to a “one click” phylogeny analysis using the Phylogeny. fr online tool (Dereeper et al. 2008, 2010). This analysis included a MUSCLE alignment (Edgar 2004) and a Gblocks (Castresana 2000) curation step before phylogenetic analysis was conducted using PhyML (Guindon & Gascuel 2003, Anisimova & Gascuel 2006).

Results and Discussion

The assembly yielded a genome of 32 316 120 bp assembled into 4132 contigs. Of these, 1837 were 500 bp or larger and contained 31.61 Mb of the genome. The genome had a GC content of 51.83 %, a N50 value of 103 119 bases (with the largest contig 398 757 bp in size), and a L50 value of 94 contigs. The average coverage of this assembly was 170x. The genome completeness, as assessed by BUSCO analysis, was 95.86 %, with 278 complete, five fragmented and seven missing BUSCO terms out of the 290 searched.

The genomic sequence of Teratosphaeria destructans reported here is the first published genome sequence for a member of this genus (Fig. 5). Considering its status as an emerging pathogen (Andjic et al. 2011, Greyling et al. 2016, Burgess & Wingfield 2017) the availability of a genome sequence holds various benefits. These include its use in comparative genomic studies between different Teratosphaeria species, many of which are a concern to the global Eucalyptus industry (Hunteret al. 2011). Similar studies have already yielded insight into the factors that determine host range, while also elucidating an arsenal of pathogenic effectors and virulence factors important for the pathogenicity of fungal species (Klosterman et al. 2011, Condon et al. 2013, Zhao et al. 2013, Deng et al. 2017). The availability of a genome sequence also provides the opportunity to develop species-specific microsatellite markers (Gnocato et al. 2017, Rafiei et al. 2018) that would be important for studying invasive populations of this aggressive, emerging tree pathogen.

Fig. 5
figure 5

A Maximum Likelihood phylogeny showing Teratosphaeria species including the genome sequences of Teratosphaeria destructans reported here. The β-tubulin and EF1-α gene regions were used and were obtained from previous studies (Quaedvlieg et al. 2014, Aylward et al. 2018).

IMA Genome-F 10E

Draft nuclear genome assembly for Davidsoniella eucalypti

Introduction

The family Ceratocystidaceae includes an ecologically diverse assemblage of fungi currently classified into 11 distinct genera (De Beer et al. 2014, 2017, Mayers et al. 2015, Nel et al. 2018). The best-known of these is Ceratocystis, a genus that includes economically important plant pathogens such the mango and Acacia mangium pathogen C. manginecans (Al Adawi et al. 2013, Tarigan et al. 2011) and the causal agent of cacao wilt C. cacaofunesta (Baker Engelbrecht & Harrington 2005). Species of the genera Endoconidiophora and Thielaviopsis are well known as pathogens of conifers and monocot plants, respectively (Mbenoun et al. 2014, Wingfield et al. 2013). The genera Bretziella and Berkeleyomyces were recently erected to accommodate the casual-agent of oak wilt Br. fagacearum (previously Ceratocystis fagacearum; De Beer et al. 2017) and the multi-host root pathogen Be. basiola (formerly Thielaviopsis basicola; Nel et al. 2018). Many Ceratocystidaceae species rely on flies, picnic beetles, and ambrosia beetles for spread, forming casual associations with these insects (Van Wyk et al. 2013). In contrast, species in the genera Ambrosiella, Meredithiella, and Phialophoropsis form obligate mutualisms with ambrosia beetles of the tribes Xyleborini, Corthylini and Xyloterini, respectively (Mayers et al. 2015). Three asexual species associated with woody substrates are present in Chalaropsis, while members of the genus Huntiella are considered saprobes or weak pathogens, with only a handful of species known to cause sapstain of timber (De Beer et al. 2014). The genus Davidsoniella consists of four species, three of which (D. eucalypti, D. neocaledoniae and D. australis) are present in Australasia, with D. virescens being described from North America (De Beer et al. 2014).

In this study we report a draft nuclear genome assembly for Davidsoniella eucalypti, first isolated from stem wounds on living Eucalyptus species in Australia (Kile et al. 1996). Although able to colonize wounds made in these trees, the fungus causes limited damage and is not considered pathogenic (Kile et al. 1996). In contrast, D. virescens, D. australis, and D. neocaledoniae are all considered plant-pathogens, causing disease on maple trees (Acer spp.), Nothofagus cunninghamii trees (Kile 1993), and Coffea robusta plants (Dadant 1950), respectively.

Davidsoniella eucalypti and D. virescens are the only two species in the genus for which a sexual morph is known (De Beer et al. 2014), although the reproductive strategy between these species differ dramatically—homothallism in D. virescens and heterothallism in D. eucalypti (Harrington et al. 1998). With the genome sequence of D. virescens already published (Wingfield et al. 2015b), the addition of the D. eucalypti genome brings the total number of Davidsoniella sequences to two. This raises the possibility for interesting genomic studies on these two biologically diverse species.

Sequenced Strain

Australia: Victoria: Cabbage Tree Creek, isolated from Eucalyptus sieberi, Aug. 1989, M.J. Dudzinski (CMW 3254, C 639, DAR 70205—dried culture).

Nucleotide Sequence Accession Number

This Whole Genome Shotgun project for Davidsoniella eucalypti isolate CMW 3254 has been deposited at DDBJ/ENA/GenBank under the accession RMBW00000000. The version described in this paper is version RMBW01000000.

Materials and Methods

Davidsoniella eucalypti isolate CMW 3254 was obtained from the culture collection of the Forestry and Agricultural Biotechnology Institute (FABI) and grown on 2 % malt extract agar (MEA: 2 % w/v, Biolab, South Africa) at 25 °C. A 14-d old culture was used for genomic DNA extraction using a previously described phenol-chloroform protocol (Roux et al. 2004). The isolated DNA was submitted to Macrogen (Seoul, Korea) to generate long-read sequences from three cells of the Pacific BioSciences Single-molecule real time (SMRT or PacBio) sequencing platform. This was complemented by a second round of sequencing on the Illumina HiSeq 2500 instrument at Macrogen. For this, a single library with 550 bp insert size was generated and used to produce pair-end reads of 250 bp target length. The DNA for the Illumina sequencing was extracted as described by Duong et al. (2013).

The reads obtained from the three cells of the SMRT sequencing run were concatenated into a single fastq file which was used for read-correction, trimming and assembly using Canu v1.4 and default settings (Koren et al. 2017). The resulting assembly was scaffolded using SSPACE-LongRead with the default settings (Boetzer et al. 2011, Boetzer & Pirovano 2014), using the corrected long-read sequences produced by Canu. The paired-end Illumina reads were imported into the CLC Genomics Workbench v11.0.1 (Qiagen, South Africa) and trimmed using default settings. These trimmed reads were indexed and aligned to the scaffolded, long-read assembly using BWA (Li & Durbin 2009) and SAMtools (Li et al. 2009). The alignment files were used in three rounds of Pilon corrections (Walker et al. 2014) to improve the long-read assembly by correcting single base differences, small insertions/deletions and other mis-assemblies identified in the draft genome assembly. To produce the best version of the assembly, the trimmed Illumina pair-end reads were used by GapFiller (Boetzer & Pirovano 2012) to fill gaps produced in the assembly during the scaffolding process. The draft genome was assessed for completeness using the Benchmarking Universal Single Copy Orthologs tool (BUSCO v 2.0.1) (Simão et al. 2015) and the Ascomycota database. An estimation of the number of protein coding genes in the genome was made by the de novo prediction software AUGUSTUS using the Fusarium graminearum gene models (Keller et al. 2011, Stanke et al. 2006b), while general genome statistics (genome length, GC content, N50, L50 and largest contig size) were calculated using QUAST v5.0.1 (Mikheenko et al. 2018)

The 60S, LSU and MCM7 gene regions were extracted from the genome and, together with these regions from D. eucalypti, D. virescens, D. neocaledoniae, D. australis, Endoconidiophora polonica, and E. laricicola (De Beer et al. 2014) were used for phylogenetic analysis. The datasets were subjected to a “one click” phylogeny analysis at the Phylogeny.fr online tool (Dereeper et al. 2008, 2010) that included a MUSCLE alignment (Edgar 2004) and a Gblocks (Castresana 2000) curation step before phylogenetic analysis was conducted using PhyML (Guindon & Gascuel 2003). Branch support was calculated using the approximate likelihood ratio test (Anisimova & Gascuel 2006).

Results and Discussion

The 41 874 515 bp genome assembly of Davidsoniella eucalypti was present in 1219 contigs of 1000 bp or large, the largest of which was 1 065 836 bp. The genome had a G/C content of 45.93 %, an average coverage of 69x, a N50 value of 230 092 bp and a L50 value of 51. AUGUSTUS predicted 9029 protein coding genes, while BUSCO reported a completeness score of 96.1 %. This was based on the analysis of 1315 orthologs, with 1236 present as complete single copies, 27 as complete duplicated copies, 16 as fragmented copies, and 36 copies absent.

The genome assembly of D. eucalypti differed dramatically from that of the sister-species D. virescens (Fig. 6) (Wingfield et al. 2015b). At 33.6 Mb, the latter genome was 8.3 Mb smaller than that of D. eucalypti. The latter genome is also predicted to encode more proteins (9029 vs. 6953 for D. virescens), although the gene densities were comparable at 207 and 215 genes/Mb for D. virescens and D. eucalypti respectively. It is known that the genome sizes of plant-pathogenic filamentous fungi tend to be larger than those of non-pathogenic relatives, mostly due to the presence of high amounts of repetitive DNA and an expansion of the effector repertoire (Frantzeskakis et al. 2018, Möller & Stukenbrock 2017, Raffaele & Kamoun 2012). Therefore, the larger genome size of the non-pathogenic D. eucalypti as compared to the pathogenic species D. virescens was surprising and warrants further study.

Fig. 6
figure 6

A maximum-likelihood phylogeny showing the position of the Davidsoniella ecualypti isolate used for this genome. Represented are the four known species of Davidsoniella, with two Endoconidiophora species used as outgroup. Approximate likelihood ratio test values for branch support are shown as percentages.

The hybrid assembly of D. eucalypti presented here has a N-50 value (230092 bp) twice that of D. virescens (Wingfield et al. 2015b). This improvement in contig contiguity can be attributed to the inclusion of long-read PacBio sequences (English et al. 2012), a trend seen for many other genomes (Huddleston et al. 2014, Koren et al. 2013, Koren & Phillippy 2015). The availability of a highly contiguous genome sequence for one Davidsoniella species could provide the basis for genomic comparative studies. These should be of much interest as the higher gene number and larger genome size of D. eucalypti might point to an interesting evolutionary history for the genome of this species.

IMA Genome-F 10F

Draft genome sequence of Grosmannia galeiformis

Introduction

Grosmannia galeiformis was first described as Ceratocystis galeiformes from conifer infesting bark and ambrosia beetles in Scotland in 1951 (Bakshi 1951). It was later transfered to Ophiostoma (as Ophiostoma galeiforme; Mathiesen-Käärik 1953), and thereafter to Grosmannia (as G. galeiformis; Zipfel et al. 2006).

Grosmannia galeiformis is often found associated with conifer-infesting bark beetles and has been reported from Europe (Bakshi 1951, Linnakoski et al. 2012, Mathiesen-Käärik 1953, Zhou et al. 2004), South America (Zhou et al. 2004), and Africa (Zhou et al. 2004). Phylogenetic studies indicated that this species is part of a complex of several closely related species known as the G. galeiformis species complex, which is distinct from other species complexes in Leptographium s. lat. (Chang et al. 2019, De Beer et al. 2013 Linnakoski et al. 2012; Fig. 7). In this study, we sequenced and assembled the draft genome sequence for G. galeiformis, the key species representing the G. galeiformis species complex.

Fig. 7
figure 7

Phylogenetic tree generated from maximum likelihood analysis of a dataset consisted of partial beta-tubulin gene to authenticate the identity of G. galeiformis used in this study. Bootstrap values (>= 70; 1000 replicates) are indicated at nodes. Beta-tubulin gene sequence for G. galeiformis was extracted directly from genome assembly. Other authenticated reference sequences were obtained from GenBank database.

Sequenced Strain

United Kingdom: Elgin: on Pinus sylvestris (Scotch pine) infested with Tomicus piniperda, 29 Aug 1997, T. Kirisits & M.J. Wingfield (epitype isolate CMW 5290 = CBS 115711, PREM57491).

Nucleotide Sequence Accession Number

The genomic sequence of Grosmannia galeiformis (CMW 5290, CBS 115711) has been deposited at DDBJ/EMBL/GenBank under accession no. RQWE00000000. The version described in this paper is version RQWE01000000.

Methods

Genomic DNA was extracted from freeze-dried mycelium obtained from a single spore culture of G. galeiformis (CMW 5290). DNA extraction was done following a previously described method (Duong et al. 2013). Genome sequencing was carried out on the Illumina HiSeq 2000 platform (University of California Davis, CA). Two libraries (350 and 550 bp insert sizes) were prepared and sequenced to obtain 100 bp pair-end reads. Obtained pair-end reads were trimmed using Trimmomatic v. 0.36 (Bolger et al. 2014), and de novo assembled using SPAdes v. 3.9.0 (Bankevich et al. 2012). Scaffolds obtained from SPAdes was further placed into larger scaffolds using SSPACE-Standard v. 3.0 (Boetzer et al. 2011), and assembly gaps were filled using GapFiller v. 1.10 (Boetzer & Pirovano 2012). The completeness of the resulting assembly was estimated using Benchmarking Universal Single Copy Orthologs (BUSCO) program v. 2.0.1 with the Sordariomyceta odb9 dataset (Simão et al. 2015). Protein coding gene models were predicted using the MAKER genome annotation pipeline (Cantarel et al. 2008) with the combination of GeneMark v. 4.32 (using self-training; Lomsadze et al. 2005) and AUGUSTUS v3.2.2 (using species models optimised for Neurospora crassa; Stanke et al. 2006a) as gene predictors.

Results and Discussion

Over 25 million 100 bp pair-end reads were obtained after filtering and trimming. The final draft assembly consisted of 869 scaffolds that were over 500 bp in size. The assembly had a N50 of 67.79 Kb and and a genome size of around 26,44 Mb. BUSCO reported the score for the assembly of 97 %[D:5.8 %], F:1.8 %, M:0.6 %, n = 1348 (C: complete; D: duplicated; F: fragmented; M: missing, n = number of genes), which is comparable to that from other species of Leptographium s. lat. generated from previous studies (Wingfield et al. 2015a, 2016). Genome annotation using MAKER pipeline with GeneMark and AUGUSTUS as gene predictors resulted in 8527 protein-coding gene models. The genome of G. galeiformis generated in this study will add to the already growing genome resources for species of ophiostomatoid fungi (Wingfield et al. 2015a, b, 2016), which will facilitate future comparative genomic and evolutionary studies of these fungi (Fig. 7).