Genomic data
To gain insights into the functional differences and the phylogenetic relationship between all fungal strains targeted in this work, their genomes were completely sequenced by application of 3rd generation sequencing technologies. The respective sequences are stored on the ENA (European Nucleotide Archive) portal of the EMBL-EBI (https://www.ebi.ac.uk/) under the bioproject numbers PRJEB36622 (A. truncatum), PRJEB36624 (D. concentrica), PRJEB36625 (E. liquescens), PRJEB36647 (Hypom. monticulosa), PRJEB37480 (Hypom. spongiphila) PRJEB36653 (Hypom. submonticulosa), PRJEB36654 (H. fragiforme), PRJEB36656 (H. lienhwacheense), PRJEB36657 (H. pulicicidum), PRJEB36658 (H. rickii), PRJEB36693 (J. multiformis), PRJEB36695 (H. rubiginosum), PRJEB36696 (P. hunteri) and PRJEB36697 (X. hypoxylon).
The genome sequences of the analyzed fungi were mainly established using a combination of Nanopore and short-read Illumina methods. The latter data were used to improve base accuracy and thus significantly reduce error rates in the final genomes. For comparative reasons two species (H. fragiforme and H. rubiginosum) were randomly chosen for PacBio sequencing. General genome features, e.g., size, contig number, GC content, and numbers of predicted genes, are summarized in Table 1.
Table 1 Details of the genome sequences generated for the selected Xylariales The number of reads generated by Nanopore sequencing ranged from 83,869 to 937,388 with an average read count of 406,479. The mean read length averaged 9736 bp (5454–27,241 bp) with a maximum read length average of 470,823 bp (64,165–2,031,748 bp). Illumina polished assemblies resulted in a total contig number average of 55 (16–123) with a 61 fold average coverage (14.8–125.1 fold). Details on the genome sequencing statistics for the individual strains are listed in the supplementary information (Tab. S1). Established genomes range in size from 35 to 54 Mb and feature GC contents around 45%.
Sequencing on the Sequel System (PacBio) resulted in 727,768 (sub)reads with a mean length of 7823 bp for H. fragiforme. The genome assembly yielded 36 contigs summing up to a final genome size of 38.1 Mbp (3.6 Mbp N50 contig length, 127× Genome Coverage). Sequencing of H. rubiginosum resulted in 692,592 (sub)reads with a mean length of 14,154 bp. The genome assembly yielded 70 contigs summing up to a final genome size of 48.3 Mbp (1.1 Mbp N50 contig length, 202× Genome Coverage).
In general, the combinatorial approach (Illumina/Nanopore) performed equally to the newest generation of PacBio sequencing in terms of generated contigs, N50 contig lengths and average read lengths, which resulted in 14 high quality draft genome sequences. PacBio sequencing yielded higher read numbers and higher genome coverage, but maximum read length with Nanopore sequencing was longer. Based on the obtained data, we can state that the generation of high-quality fungal genome sequences can be achieved by different technologies leaving mycologists with options depending on their personal preferences.
The re-sequencing of H. pulicicidum (ATCC 74245) resulted in a substantially improved genome quality compared to the previous report (Nicholson et al. 2018). The genome size increased from 41.4 to 43.5 Mbp and contig numbers were reduced from 204 to 24. In addition, the N50 increased almost sevenfold from 580,679 to 3,855,590 bp. This demonstrates that long-read technologies enable access to additional genomic information (here more than 2 Mbp) offering a much more robust basis for genomic comparisons.
In the case of X. hypoxylon, the genome sequence of a different strain has been published recently (Büttner et al. 2019). The respective length of the Illumina assembly summed up to 42.8 Mbp, which is significantly shorter than what was obtained for the ex-epitype strain (54.3 Mbp) herein. Consequently, the number of predicted genes was lower as well (11,038 vs 12,704 here). These differences can be mainly attributed to the sequencing quality (N50 of 0.1 Mbp vs 3.8 Mbp, and 635 vs 88 contigs in our study, respectively). Similar to H. pulicicidum, third generation sequencing resulted in a substantial gain of genetic information (~ 15% of gene content), which makes subsequent analyses more reliable.
CAZyme analysis
The Hypoxylaceae are considered as endophytes that can switch to a saprobiontic lifestyle if necessary. However, there is no conclusive information on whether family members are able to become facultative parasites. To address this question at the genomic scale, a CAZyme analysis was conducted to investigate the number of enzymes possibly involved in carbohydrate interactions (Fig. 1). The number of related genes and modules ranged from 510 (H. lienhwacheense) to 780 (H. rubiginosum) with an average of 669. Glycoside hydrolases (GH) accounted for almost half of the identified enzymes (218–289), followed by the auxiliary activity proteins (AA, 96–187), glycosyl transferases (GT, 89–108) and carbohydrate esterases (CE, 76–136). Polysaccharide lyases (PL) comprised only a few genes (5–16) and carbohydrate-binding modules (CBM) accounted for approx. 7%of the hits (25–59). Depending on the strictness of the E-value settings in dbCAN (standard < 1e−15), a significant number of CBMs (8–21) and CEs (59–99) were not recognized, while the values for the other enzyme classes remained stable (see Supplementary information for more details). We want to highlight this result as many publications, which involve dbCAN analyses do not specify their parameters for CAZyme search. Therefore, comparisons with other datasets need to be treated carefully as they might lead to wrong conclusions.
The relatively high numbers of CAZymes in this study are in the range of other known saprobiontic fungi, such as Aspergillus oryzae or Penicillium chrysogenum (Zhao et al. 2013) and also match those of previously analyzed X. hypoxylon and A. stygium strains (Wingfield et al. 2018; Büttner et al. 2019). In contrast, the P. fici genome (which is also used in this paper for subsequent analyses) was reported to contain much higher numbers of respective enzymes (460 GHs, 138 CEs, 121 GTs, 39 PLs; Wang et al. 2015). As no parameters for the dbCAN analysis were reported in the corresponding reference, the encoded proteins of P. fici were re-analyzed using the same settings as for the Hypoxylaceae proteins. The previous results could be mostly confirmed except for CE, where we found even higher numbers (243), demonstrating that the capabilities of utilizing carbohydrate sources within the Xylariales can strongly vary.
When analyzing the individual families of CAZymes (a detailed list of which can be found in the Supplementary Information), a slightly higher number of proteins are categorized as cutinases (CE5, 3–8 genes) in comparison with the average results obtained by Zhao et al. (2013) for saprobiontic fungi (Ø = 1; 0–7 genes). Cutinases are responsible for the digestion of cutin and are usually correlated with plant infection events enabling a fungus to invade its host (Lu et al. 2018). This expanded set of CE5 enzymes corresponds well with the proposed initial endophytic lifestyle, as the entry into the host plants is a crucial step to establish such an association. Unfortunately, little is yet known on how Hypoxylaceae recognize and enter their host. Only in the case of Hypoxylon fragiforme has it been demonstrated that ascospore eclosion and germination occurs at high frequency when triggered by monolignol glucosides released from the beech host (Chapela et al. 1993). Even though there is no supportive data available, it is likely that the site of infection is primarily located at leaf surfaces (which is also the main tissue to acquire endophytic isolates of Xylariales). The germinating spores (ascospores or conidia) can enter the plant either by using opened stoma or by lysing the protective cuticle layer to grow into or in-between the cells. In general, there is a lack of knowledge about the host range of the individual Hypoxylaceae species as the association of their stromata with certain tree species does not necessarily reflect the distribution of a given species. This became, in particular, evident when various family members have been isolated from different plant parts of Viscum album and Pinus sylvestris (Peršoh et al. 2010), which in contrast never harbor Hypoxylaceae stromata. It can therefore be assumed that these fungi have a rather broad host range. Thus, an extended number of cutinases might be required to sustain the capacity of entering various hosts.
Pectin degrading enzymes are important to break down the cell wall of plants and are especially abundant in pathogenic fungi. Members of the majority of corresponding CAZYme families (CE8, PL1, PL2, PL3, PL9, PL10, GH28, GH78, GH88) can be found within our genomes (7 out of 9), but the number of copies is lower than those identified for most phytopathogenic fungi (CE8: 2.7 vs 4.0; PL1: 1.9 vs 5.3; PL3: 0.8 vs 3.1; PL9: 0.1 vs 0.5; GH28: 3.1 vs 7.1; GH78: 3.1 vs 4.3; GH88: 0.6 vs 1.3) (for detailed numbers see Zhao et al. 2013). Hence, Hypoxylaceae and X. hypoxylon encode in general the capabilities to degrade pectin and related carbohydrates, but to a lesser extent than plant pathogens.
The CAZyme analysis supports the hypothesis of Hypoxylaceae being endophytes without a major pathogenic life stage that can easily switch to a saprobiotic lifestyle based on their carbohydrate degrading capacities. Even though it is unlikely that these fungi actively harm their host, the nature of the symbiosis still remains obscure, and probably also varies during the host life cycle. Commensalistic or mutualistic interactions are generally possible and depend on the effect of the fungus on its host. The respective genome sequences should enable more thorough future studies to resolve these cryptic fungus-host interactions.
Phylogenomic analysis
Previous phylogenetic studies have shown that the Hypoxylaceae are a well-conserved monophyletic family (Wendt et al. 2018; Daranagama et al. 2018). In particular, the inclusion of house-keeping genes and calculation of multi-locus trees has improved the topology of the corresponding reconstructions, which well-resolved the structure of the major clades. Nevertheless, the respective tree backbones lack support, preventing conclusive results on intergeneric evolutionary relationships. Therefore, we aimed to construct a backbone tree based on a core set of proteins derived from genomic data of the genome-sequenced strains and compare the clade relationships with those of previously published studies. As gene or protein trees do not necessarily reflect the evolutionary history of taxa, a coalescence-based ASTRAL species tree was calculated for comparison. Moreover, we wanted to estimate how well genomic trees perform in comparison to multilocus trees. For the calculations, Pestalotiopsis fici was chosen as the outgroup due to its taxonomic classification in the Xylariales (note that the taxonomic position of the species is still under debate, as recent studies suggest a placement into the new order Amphisphaeriales, a proposed sister taxon of the Xylariales, see Jaklitsch et al. 2016; Hongsanan et al. 2017; Liu et al. 2019; Hyde et al. 2020 for discussion on this topic) and the relatively high quality of the genome sequence (Wang et al. 2015). An EDGAR analysis revealed the presence of 4912 homologous genes located in all 15 genomes, the amino acid sequences of which were used to assess the relative phylogenetic position of the fungi. The concatenation of the 4912 GBLOCKS-cured amino acid alignments resulted in a data matrix with 2,532,758 sites, from which 49.0% were constant and 28.5% variable sites. Cured single loci alignments for species-tree inference ranged from 44 to 4781 amino acids (median 437.5 ± 358.4 aa). Alignment length, selected substitution models, distinct patterns, log-likelihood, constant and variable sites for each alignment are summarized in the supplementary material. The resulting gene and species tree showed identical topologies with four major clades (C1–C4, Fig. 2), which include all Hypoxylaceae species except H. rubiginosum. The backbone and all branches received maximum bootstrap support or posterior probability values in both phylogenomic reconstructions. As expected, X. hypoxylon forms a sister clade to the Hypoxylaceae, where H. rubiginosum is located at a separate branch at the base of the family clade.
Clade 1 (C1) is composed of H. fragiforme and H. rickii with both species displaying similar branch lengths. These taxa have been already demonstrated to be closely related based on morphological, chemotaxonomical and phylogenetic data (Kuhnert et al. 2014a, 2015c). They mainly differ in their stromata shape (hemispherical vs. effused-pulvinate) and distribution (temperate zones vs tropical zones).
The second clade (C2) contains H. pulicicidum, A. truncatum and J. multiformis. Before the erection of Jackrogersella, the latter two species were placed in the same genus based on their morphological similarities (carbonaceous layer surrounding the perithecia, ostioles higher than the stromatal surface, dehiscent perispore with conspicuous thickening). However, their strongly deviating secondary metabolite profiles in conjunction with completely reduced stromatal discs set them clearly apart (Kuhnert et al. 2017; Wendt et al. 2018). In contrast, the relationship with H. pulicicidum does not seem obvious in the first place, but the presence of papillate ostioles in overmature stromata and reduced discs in younger material morphologically resembles the main characteristics of Jackrogersella. In addition, the appearance of cohaerin-type stromatal pigments in H. pulicicidum (and closely related Hypoxylon species, see Sir et al. 2019) is completely restricted to these taxa and members of Jackrogersella. Based on the phylogenomic reconstruction and taxonomic significant similarities, it can be speculated that Annulohypoxylon is derived from Jackrogersella, which by itself shares a common ancestor with Hypoxylon. Hence, Annulohypoxylon in its current definition represents in an evolutionary context one of the derived genera in the Hypoxylaceae.
The genus Hypomontagnella forms the third clade (C3) in the phylogenomic tree. Two of the representatives are endosymbiontic isolates (Hypom. submonticulosa from plant and Hypom. spongiphila from a marine sponge) and only Hypom. monticulosa is stroma-derived. The marine isolate was originally identified as Hypom. monticulosa and is now treated as a distinct new species based on the genomic comparisons (see the taxonomic section for more details). The Hypomontagnella clade is located on a long branch supporting its status as a distinct genus. Due to the sister clade placement of C3 to C4 Hypomontagnella taxa are evolutionarily closer to Daldinia and Pyrenopolyporus than to other genera included in this study (note that the taxonomic position of H. lienhwacheense is unclear). This result contradicts a previous study where Hypomontagnella grouped with C2 in a multigene phylogenetic tree (based on the ITS, 28s rDNA, RPB2 and TUB2 loci) with statistical bootstrap support (Lambert et al. 2019). These differences might be caused by the restricted sample size in our analysis but could also indicate that phylogenomic data better resolve the taxonomic relationship of Hypomontagnella. Future studies with a larger taxon selection will prove whether the organization of this basic phylogenomic tree remains stable.
In clade 4 (C4), the representatives of Daldinia, Pyrenopolyporus and H. lienhwacheense clustered together. The peltate to hemispherical shape of Pyrenopolyporus stromata and its naphthalene content are reminiscent of Daldinia and underline their relationship. The position of H. lienhwacheense is inconclusive especially since the metabolite profile of this taxon is unique among all analyzed Hypoxylaceae species (unpublished data). In addition, its stromata are irregular in form with conspicuous elevations. The latter could be an evolutionary precursor of peltate ascomata, which further evolved into hemispherical shapes. In a recently published phylogenetic analysis with a much larger taxon selection, H. lienhwacheense also clustered with Daldinia species and Pyrenopolyporus species but lacked support and had no close relatives (Sir et al. 2019). Moreover, the genome of H. lienhwacheense is the smallest among all sequenced species herein with a size reduction of 12.5 Mbp compared to H. rubiginosum. Therefore, it is possible that H. lienhwacheense represents a derived lineage and has undergone heavy gene or chromosome loss in the course of adaptation. Due to its isolated position in the phylogenies, the genome characteristics and metabolite profile, its affinities with Hypoxylon are only supported by certain morphological features. The data would justify the erection of a new genus, but we refrain from this for the time being as we see the necessity to include the putative relatives H. lividipigmentum, H. lividicolor and H. brevisporum in a corresponding study. The lack of viable cultures for the latter two species currently prevent such an approach.
Hypoxylon rubiginosum appears as a basal branch within the Hypoxylaceae. Based on the branch length and position (sister group of the Xylariaceae), this fungus most likely represents an old lineage. Still, the morphological habitus of H. rubiginosum is typical for the family and has little in common with the genus Xylaria. To deduce the divergent evolution of the two families and understand the development of the distinct morphologies, many more representatives of the Xylariaceae (e.g. Nemania spp., Kretzschmaria spp., Rosellinia spp.), and also other related families (e.g. Graphostromataceae spp., Barrmaeliaceae spp., Lopadostomataceae spp.) should be sequenced.
To evaluate the strengths of our amino acid-based phylogenomic reconstruction compared to previous standard phylogenies, we calculated a multigene tree for the same species set based on the complete gene sequences of alpha-actin, beta-tubulin, RPB2 and TEF1-α (Fig. 2c). The concatenated dataset of the four genes had a final length of 8566 nucleotides of original 10,614 positions (80.7% of the original alignment). Unique site patterns were constituted by 3007 positions, while 2713 were parsimony informative and 4925 invariant sites. The substructure of the clades is resolved identically to the phylogenomic trees, but the clades are arranged in a different order and with lower support values for the backbone. Thus, we conclude that the multigene phylogenetic family reconstruction does only correctly reflect the evolutionary origin of closely related genera, while the relationship of major lineages (here equal to the clades) remains unresolved. Thus, previously published phylogenies of the Hypoxylaceae (Wendt et al. 2018; Lambert et al. 2019) can only provide a robust picture of infrageneric relationships. In addition, the position of certain family members such as H. trugodes and H. griseobrunneum remains unresolved in these calculations and will most likely only stabilize in phylogenomic reconstructions. Our results of the phylogenomic analysis are promising in this regard and the inclusion of more high-quality genome sequences will help to understand the evolution and speciation of Hypoxylaceae.
For the phylogenomic reconstructions, we have deliberately decided to use protein sequences rather than nucleotide sequences as these are directly correlated with structural and functional information, and thus better reflect evolutionary divergence (Chowdhury and Garai 2017). However, it is important to note that this approach has its limitation for very closely related organisms as synonymous changes of nucleotides do not alter the protein sequence and hence core proteins only provide restricted information (which is the case for the species pair Hypom. monticulosa and Hypom. spongiphila). In addition, protein level phylogenomic reconstructions are susceptible to sequencing errors, which can strongly influence the predictions of genes and therefore should only be considered if high-quality genome sequences are available. On the other hand, the slow evolution of protein sequences allows for a more robust inference of relationships between fungal families and higher ranks, where the alteration of the respective gene sequences is usually too strong for reliable alignments. Thus, we are convinced that protein-based phylogenetic reconstructions will become more important in the future to reevaluate current taxonomic concepts.
Comparative genomics
POCP analysis
To better estimate the genomic differences between the fungi and to deduce the relative amount of individual genes, a percentage of conserved protein (POCP) analysis was conducted (Fig. 3). In advance, it is important to state that the results of POCP analyses depend on the reliability of the applied gene prediction models. The more accurate such models can call genes and their respective coding sequences, the higher the significance of POCP becomes. To test the quality of the gene prediction in our genomes, the transcriptome of Hypom. monticulosa under two different conditions was sequenced and reads were implemented into the Braker pipeline for improved gene prediction (Hoff et al. 2019). Braker called 12,744 genes, while GeneMark and Augustus predictions resulted in 12,477 and 11,204 genes, respectively. Sizes and intron mapping of predicted genes displayed high similarity between the Braker and GeneMark output. In contrast, roughly 30% of the genes predicted by Augustus deviated in these parameters. Thus, it can be concluded that GeneMark already delivers a good coverage of the gene content, which is comparable to RNA-Seq-based prediction pipelines. Consequently, protein sequences derived from GeneMark predictions were chosen as the starting point for subsequent analysis. Nevertheless, protein translations can still differ from the actual protein sequences, which however has usually little influences on the recognition of orthologous proteins in other organisms. In addition, the number of unpredicted genes cannot be evaluated by the applied methods and therefore some of the conserved proteins might be missed for the POCP analysis. As a result, the data presented in the following have to be treated as an approximation to the real values.
To study how POCP numbers vary across families, P. fici (Sordariomycetes, Xylariomycetidae, Xylariales, Sporocadaceae), Aspergillus niger and Penicillium chrysogenum (both Eurotiomycetes, Eurotiomycetidae, Eurotiales, Aspergillaceae) were included in the dataset. The values ranged from 72.5 to 95.8% within the Hypoxylaceae and were in general higher for phylogenetically related species, albeit with some exceptions. For example, H. rickii shares 82.6% of its encoded proteins with P. hunteri, and Hypom. monticulosa shows a similar number compared to H. pulicicidum. Surprisingly, between the selected members of the genus Hypoxylon the POCP did not exceed values higher than 79.8%, but also did not fall below 72.5% confirming the already known heterogeneity of the genus (Wendt et al. 2018). As these numbers strongly overlap with those of species from different genera or are even sometimes smaller, there is no clear boundary between interspecific and intergeneric POCPs. In comparison, the values between other members of the Xylariales and Hypoxylaceae are located in the range of 62.5 to 66.8% for X. hypoxylon and 55.6 to 60.1% for P. fici. These values do not overlap with those calculated for species within the Hypoxylaceae in our taxon selection. The POCP values significantly drop when comparing unrelated species as demonstrated by A. niger and P. chrysogenum where the numbers were always below 50% (but not lower than 40.3%). The POCP analysis has been proposed as a measure to define genus boundaries in prokaryotes (Qin et al. 2014). The authors demonstrated that species of different genera share less than half of their proteins and consequently selected 50% as threshold for genus delimitations. As shown herein this value is not applicable for fungal taxonomy and overall proves that genes are much more conserved in fungi than in bacteria. However, POCP analysis might be helpful to delimit species or define family borders in fungi. Even very closely related species differed in at least 4.2% of their encoded conserved protein content. In order to determine how valuable this number is, various additional isolates of individual fungi should be genome-sequenced to compile a proper comparison and support the data by statistical means. In our example, the POCP analysis always yielded values between non-family members below 70%, while the values between Hypoxylaceae species were in all cases above 70%. Whether this threshold can be generally applied for fungi or whether it is even valid for the Xylariales cannot be answered yet as additional fungi of various families from this particular order need to be sequenced. Furthermore, sequences of a broad range of genera within other fungal families are scarce to conduct such a respective study on the class or even division level. Nevertheless, based on our data we propose that members of the Hypoxylaceae share on average at least 70% of their encoded proteins. To the best of our knowledge, this is the first study to apply POCP analysis to a larger taxon selection in fungi.
ANI analysis
Besides the POCP, we analyzed the Average Nucleotide Identity (ANI, Fig. 4) of the species which measures the nucleotide-level genomic similarity between two genomes (Arahal 2014). The ANI values between members of the Hypoxylaceae varied from 71.1 to 93.3%. The (by far) highest similarity was found between the genomes of Hypom. monticulosa and Hypom. spongiphila. All other genome comparisons showed maximum values of 80.4%. In contrast, the nucleotide difference between X. hypoxylon and Hypoxylaceae species remained relatively stable around 32% (ANI 68.5–67.8%). Based on our calculations, family members of the Hypoxylaceae share at least 70% of their nucleotide content. This threshold is identical with the one estimated for the POCP analysis. ANI analyses have been widely applied to check genomic variations between prokaryotic genomes and even revealed clear species boundaries (Jain et al. 2018). However, in fungi this method has only been used for very small taxon selections. In the case of Rhizoctonia solani, four isolates were studied which only showed approximately 80% sequence similarity (Wibberg et al. 2015). In a more recent study, ANI was used to compare four related dermatophytic fungal species (Arthrodermataceae). The generated numbers (76.4–90.0%) are comparable to those obtained in our study (Alshahni et al. 2018). Due to the lack of representative genome sequences for many fungal genera, the overall value of ANI analyses in fungal taxonomy, in particular for species and family delimitation, cannot be assessed by now. However, our study provides the first evidence that this method has an important application potential for mycologists in the future.
POCP vs ANI
When comparing the ANI and POCP between species pairs in the Hypoxylaceae, it can be seen that the latter values are overall higher. This is an indicator that genomic variation within a family is more strongly influenced by the changes in nucleotide content of conserved genes (such as single nucleotide polymorphisms—SNP) than by changes in gene content (including gene loss, gene gain, gene duplication). Interestingly, this pattern changes completely when comparing family members of the Hypoxylaceae with non-family members, where the POCP numbers are much lower than the respective ANI numbers. In the case of the species pair X. hypoxylon/P. fici, the results become even more striking with an ANI of 70.8% compared to a POCP of 58.9%. Consequently, it can be assumed that, in these particular cases, gene gain and loss is much more impactful towards speciation (and in consequence the development of phenotypic characters). Whether this observation represents a general pattern in fungi cannot be deduced from our preliminary data as much more representative species from the different families of the Xylariales and also different orders need to be included. However, it points towards a weak spot in phylogenetic analysis for inferring taxonomic relationships as these calculations only reflect nucleotide differences in sequences of conserved genes. Future studies likely need to take variations on genome scale, such as genome size, gene content and genome rearrangement, into consideration to understand the evolution of fungi.
AAI analysis
The average amino acid identity (AAI, Fig. 5) measures the differences between orthologous proteins of different organisms and therefore well reflects their evolutionary distances. Within our selected Hypoxylaceae, the AAI ranged from 76.1 to 98.5% and X. hypoxylon showed values between 68.3 and 70.7% when compared to other species. For P. fici the numbers dropped slightly and were in the range of 65.3 to 67.8%. It appears that for this restricted taxon selection there is a clear difference between intergeneric and interfamilial values with an estimated threshold value of 75%. The AAI between Hypom. monticulosa and Hypom. spongiphila was by far the highest with 98.5% showing that the conserved proteins are nearly identical for species that have recently diverged. Whether this value overlaps with intraspecific AAI levels cannot be answered yet. This topic needs to be addressed in the future when more genome sequences of the same species become available. As expected, for all pairs the AAIs were higher than the ANIs as changes in the nucleotide sequence do not necessarily result in amino acid changes due to the genetic code (Castro-Chavez 2010). However, with increased taxonomic distance the AAI values approximated those of the ANI. This result can be explained by a stochastic effect, where the likelihood of silent mutations (i.e. those that do not change the encoded amino acid) decreases with increasing numbers of mutations. Our preliminary data suggest a potential application for AAI as means of taxonomic discrimination on family level. However, in fungi, in contrast to the POCP, AAI is more prone to errors in gene prediction and should only be considered for genomes with high quality intron mapping in an optimal way verified by RNA sequencing.
Gene-based genome comparison
As the final step in our genomic comparison, we analyzed the number of shared (core) and individual genes between most closely related species of the Hypoxylaceae (corresponding to the phylogenomic clades) as depicted in Venn diagrams (Fig. 6). Similar to the POCP, the numbers strongly depend on the accuracy of gene prediction and hence are not exact measurements. Nevertheless, they give an insight into the general distribution of orthologous genes between related and unrelated species. It is also important to state that the identified core genes in this analysis are only partially identical with the identified conserved proteins of the POCP analysis (see Methods section for more details), therefore the values can significantly vary.
Within clade 1, where H. rubiginosum was included for this analysis, the species had 7911 protein encoding genes in common. Despite the phylogenetic position of H. rubiginosum the individual pairs within the diagram contained approx. the same amount of shared genes ranging from 8405 (between H. fragiforme and H. rickii) to 8569 (between H. rickii and H. rubiginosum). When looking at the individual genes, H. fragiforme showed the least amount (1632), followed by H. rickii (2038) and H. rubiginosum (4187). It appears that the discrepancy between the unique genes of H. rubiginosum and the other species is not a consequence of a reduced amount of shared genes, but instead caused by heavy gene gain (or less likely gene loss in all other species). This is also reflected by the genome size of H. rubiginosum which is by far the largest (48 Mbp) among the sequenced Hypoxylaceae.
Species within clade 2 share 8656 core genes and contain between 1619 (J. multiformis) to 2386 (H. pulicicidum) singletons. The amount of common genes between the individual pairs in the clade varied only slightly in the range of 9107 to 9243 genes.
The three genome sequenced members of the Hypom. monticulosa species complex (clade 3) contain a core set of 10,253 genes. The gene overlap of Hypom. spongiphila with Hypom. submonticulosa (10,330) is smaller than with Hypom. monticulosa (11,916). The latter exhibits also the lowest number of unique genes (510) within the three species, while Hypom. spongiphila and Hypom. submonticulosa contain higher numbers (640 and 1271, respectively). This result clearly indicates that despite their close taxonomic (and evolutionary) relationship, these taxa have already diverged regarding the gene content.
The last gene-based comparison involves all four species of clade 4. Within this data subset, 6631 genes were shared among all taxa. A pairwise comparison reveals the presence of 7623 to 8563 common genes between H. lienhwacheense and other clade members. Similar numbers of conserved genes can be found for P. hunteri in relation to D. concentrica and E. liquescens (8008 and 7116, respectively). The latter two share the highest amount of genes (8791). Despite the relative small genome size of H. lienhwacheense (35.8 Mbp) and its low number of total genes (9924), this taxon still contains 1097 genes (11.1% of total gene content), which lack in the other three related organisms. In comparison, 1257, 1373 and 1490 singletons were identified for D. concentrica, E. liquescens and P. hunteri, respectively.
Based on these results, two main conclusions can be drawn. First of all, there is no obvious correlation between the number of shared genes and the genetic distance of two species indicating that presumably very closely related species can have less genes in common than distantly related species pairs. This is well reflected in Fig. 6a where H. rickii shares almost 200 fewer genes with its close relative H. fragiforme than with the more distantly related H. rubiginosum. Secondly, the relatively high number of singletons (genes exclusively present in a single genome within a given data matrix), in particular between genetically close species (e.g. Hypom. monticulosa vs. Hypom. spongiphila, H. fragiforme vs. H. rickii, D. concentrica vs. E. liquescens), suggests a high rate of gene turnover. Due to the small set of genome-sequenced species, it cannot be evaluated which are the major driving forces of this result, but we assume that gene loss and gene acquisition contribute equally to this outcome. In the latter event, lateral gene transfer has been shown to substantially contribute to this phenomenon especially in cases where fungal species (such as Trichoderma species or plant-pathogens in the Magnaporthales) can colonize a broad range of habitats and are in frequent contact with other fungi (Qiu et al. 2016; Druzhinina et al. 2018). As endophytes and saprotrophs the Hypoxylaceae are also in a constant competitive environment with other fungi offering an ideal opportunity to acquire new genes.
Authenticity of E. liquescens ATCC 46302
As a representative of a rather unusual genus inside the Hypoxylaceae, we decided to include Entonaema liquescens in our genome sequencing project. Members of Entonaema feature brightly colored stromata with usually orange KOH-extractable pigments and a liquid-filled cavity as a unique character and can therefore be distinguished from other hypoxyloid genera (Stadler et al. 2008). The type species of the genus is E. liquescens with a single existing culture which has been deposited at the ATCC (46302) more than 30 years ago. Different single loci of the strain were sequenced and included in various phylogenetic reconstructions of the family. In all of these studies, the fungus nested inside the genus Daldinia. As the massive stromata of Entonaema are reminiscent of those found in Daldinia and some species of the latter are known to possess gelatinous interiors (e.g. the Daldinia vernicosa group, and in particular D. gelatinoides; Stadler et al. 2014a, b), the taxonomic position of Entonaema appeared plausible (Triebel et al. 2005; Wendt et al. 2018). From a chemo-taxonomical point of view, E. liquescens and a few other members of the genus (e.g. E. cinnabarinum, E. globosum) show striking differences to the phylogenetically related Daldinia species due to the presence of mitorubrin-type azaphilones (Stadler et al. 2004). Within the Hypoxylaceae, this type of pigments can otherwise only be found in members of the genus Hypoxylon, thus offering a possibility to study conservation events and evolutionary aspects of the azaphilone biosynthesis on family level. In the context of a related study, we identified various sets of genes (i.e. biosynthetic gene clusters) putatively responsible for the assembly of mitorubrin-type compounds in H. fragiforme (Becker et al. in preparation) and found a high degree of gene conservation in other azaphilone-producing Hypoxylaceae (i.e. J. multiformis, H. pulicicidum, H. rickii, H. rubiginosum). In contrast, most of the species that are devoid of those pigments did not contain the respective genes with the exception of Hypom. monticulosa and Hypom. spongiphila (data not shown, but will be part of a subsequent study). When screening the genome of E. liquescens for the presence of homologous gene clusters, no hits could be found. As the genome sequence of this fungus provides one of the best resolutions (N50 of 3.5 Mbp, 31 contigs) within this study, it appears unlikely that the cluster has been missed due to sequencing gaps. This still leaves the possibility of convergent evolution of azaphilone biosynthetic genes in E. liquescens. However, as the genetic basis for azaphilone assembly seems to be conserved across various ascomycetes (see Monascus spp. and Aspergillus niger; Zabala et al. 2012; Chen et al. 2017), this is not a likely scenario. Consequently, the genomic investigation raised serious concerns about the authenticity of the E. liquescens strain ATCC 46302. To exclude the possibility that the strain has been contaminated on our side, the extracted ITS and TUB2 sequences of the genome were compared to those previously reported (AY616686, KX271248; Triebel et al. 2005; Wendt et al. 2018), which were in agreement with each other. In addition, the sequences do not fit with any known ITS-sequenced Daldinia species (Stadler et al. 2014b) (closest BLAST hits with 97.0% similarity is D. korfii, Sir et al. 2016) rendering identification to species rank impossible in case of a contaminant. In order to confirm whether the isolate it a true member of Entonaema or a contaminating Daldinia species, fresh specimens of the genus and in particular E. liquescens need to be recollected and multiple isolates have to be obtained for comparison. As this has not been achieved yet due to the rare occurrence of E. liquescens, ATCC 46302 is retained in the analysis herein, especially since it still constitutes a member of the Hypoxylaceae.
Taxonomy
Based on the genomic and ecological differences between Hypomontagnella monticulosa MUCL 54604 and tentatively identified Hypom. monticulosa CLL-205, a new species is proposed and described in the following for the latter fungus.
Hypomontagnella spongiphila Kuhnert, sp. nov.
MB 833748 (Figs. 7, 8)
Holotype: French Polynesia, Tahiti, coastal area, cave of Ti-Pari, 20 m depth, 9°45.421′S, 139°08.275′W, isolated from a Sphaerocladina sponge, 17 Dec. 2015 (host material), leg. C. Debitus, UP-CLL-205 (ex-type culture MUCL 57903, GenBank Acc. No.: ITS—KY744359/ MK131719, LSU—MK131717, RPB2—MK135890, β-tubulin—MK135892).
Etymology: Refers to the sponge host from which it was isolated.
Known distribution/host preference: Only known from the holotype.
It differs from Hypomontagnella monticulosa by its marine habitat, sterile cultures and slow growth on PDA as well as radially furrowed cultures on OA.
Sexual and asexual morph: not observed.
Culture: After 16 d at room temperature. Colonies on YMG white, sterile, velvety to felty and flat; zonate only in the center, with entire margins, reaching > 60 mm diam., reverse uncolored; on PDA white, sterile, azonate, velvety to slightly cottony, with slightly undulate margins, reaching > 40 mm diam., reverse uncolored; on OA reaching > 50 mm diam., white, sterile, velvety with slightly cottony at the center, radially furrowed surface and filiform margins formed by aerial mycelium tufts, reverse uncolored. Mycelial cords are sporadically formed on all media. No sporulation observed.
Secondary metabolites: Cultures produce sporothriolide and derivatives, sporochartines A-E, trienylfuranol A in PDB.
Notes: Hypomontagnella spongiphila appears identical to Hypom. monticulosa based on phylogenetic and chemotaxonomic data as sequences only slightly differ and the production of metabolites is very similar (Lambert et al. 2019). Morphological characters for comparison are rare as the fungus remains sterile under different culture conditions and stromata formation cannot be induced artificially. It also appears unlikely that sexual structures are formed in a marine environment. Therefore, Hypom. spongiphila was grown on different media (YMG, PDA, OA) in comparison with its close relatives Hypom. monticulosa and Hypom. submonticulosa (Fig. 7). In general, growth speed of the strain on PDA was much slower than those observed for the other fungi. Both relatives formed sporulating regions after 14 days, while Hypom. spongiphila remained sterile even after 5 weeks of incubation. On YMG media morphology only slightly differed between Hypom. spongiphila and Hypom. monticulosa, whereas Hypom. submonticulosa differed substantially. The appearance of the strains on OA was strikingly different as Hypom. monticulosa only produced visual mycelia in the center, which started to grow subsurficially towards the edge of the plate. In contrast, Hypom. spongiphila formed radially furrowed surfaces with filiform margins, which appeared related to the morphology of Hypom. submonticulosa. The genomic comparison revealed obvious differences between the different Hypomontagnella isolates (see Fig. 6c) in terms of gene content. More than 600 genes were predicted to be restricted to Hypom. spongiphila in the Hypomontagnella subset, whereas the direct comparison of H. spongiphila and H. monticulosa shows more than 700 singletons corresponding to roughly 5.5% of the total gene content (note that this value slightly differs from the respective POCP dataset as both analyses are based on different models for homology search). The function of the majority of these singletons could not be identified as the respective protein sequences showed no known homologs in the BLAST databases. These genes might be involved in the adaptation to marine environments, in particular in osmotolerance and nutrient uptake. To study whether the genetic differences between Hypom. monticulosa and Hypom. spongiphila can be also seen in the organization of the chromosomes, a macrosynteny analysis between both genomes was carried out. In Fig. 8 a high genomic synteny can be observed between both organisms with occasional inversions of larger chromosomal areas. This suggests that the separation of the two species is a rather recent event, which is also supported by the 93.3% average nucleotide identity (Fig. 4) and 98.5% average amino acid identity.
The erection of a new fungal species evidenced by genomic information is a novel approach in fungal taxonomy. Such approaches are only feasible if genome data of related organism are available or are generated in the context of respective studies. We are aware that such sophisticated techniques are currently still not feasible for the majority of researchers. That is why for the time being other diagnostic features need to be included in the species description, which can be accessed by taxonomists. We see the ecological niche of Hypom. spongiphila and the morphological differences of the cultures as sufficient to fulfill such a criterion.