Introduction

The Fusarium fujikuroi species complex (FFSC, previously referred to as the Gibberella fujikuroi species complex) is one of several monophyletic assemblages in the genus Fusarium (phylum Ascomycota, order Hypocreales) (Geiser et al. 2013). This complex is well-known for the many well-documented plant pathogens and mycotoxin producers it includes (Kvas et al. 2009). Previous work suggests that the FFSC likely emerged during the middle-to-late Miocene (O’Donnell et al. 2013) and that its evolutionary history could have involved interspecies gene flow (O’Donnell & Cigelnik 1997). Such interspecies interactions have also been described from other Fusarium species (e.g. F. oxysporum and F. graminearum species complexes) (Ma et al. 2010, O’Donnell et al. 2000a) and, in the FFSC, was suggested to explain the existence of multiple non-orthologous copies of the internal transcribed spacer region of the ribosomal RNA genes (O’Donnell & Cigelnik 1997).

Interspecies gene flow is typically associated with hybridization and introgression (Stuckenbrock 2016). Hybridization is the production of viable and recombinant offspring by non-conspecific individuals. Introgression occurs when short-lived hybrids backcross with individuals from the parental species, allowing incorporation of new genetic material into the genome of that parental species. The process of introgressing new genetic material into the gene pool of a species is referred to as “introgressive hybridization” (Anderson & Hubricht 1938). In nature, interspecies gene flow is generally thought to be limited by species isolation mechanisms, such as vegetative incompatibility, pheromone-receptor recognition, intersterility and post-zygotic nuclear-cytoplasmic incompatibility systems that restrict or prevent the exchange of genetic material between species (Giraud et al 2008). In many Fusarium species, including those in the FFSC, laboratory-based mating studies have shown that the level of reproductive isolation is not complete and that various species are capable of interbreeding (Desjardins et al. 2000, Leslie et al. 2004b).

The mitochondrial (mt) genome is potentially a valuable tool for studying hybridization and introgression in fungi. Fungal mitochondria are mostly inherited from the maternal parent (Taylor 1986), but cross-species interactions would often lead to a short-lived heteroplasmic state in which the hybrid individual would harbour mt haplotypes from both parents (Ballard & Whitlock 2004, Barr et al. 2005). Recombination between the different haplotypes would cause the introduction and/or replacement of new genes/ regions on one or both mt genomes. Such signatures of the ancestral cross-species interactions thus would be retained in the species’ mt genomes, despite the fact that one of the mt haplotypes would typically be purged from subsequent populations (Rand 2001). In fish, for example, Wilson & Bernatchez (1998) described an ancient introgession in Salvelinus namaycush (trout) due to the presence of a single mt haplotype belonging to S. alpines (arctic char) in the S. namaycush population. In plants, Jaramillo-Correa & Bousquet (2005) described mitochondrial recombination between Picea mariana (black spruce) and P. rebens (red spruce) as a result of introgressive hybridization in the zone of contact between these conifers in north-eastern North America. Examples from fungi are still limited, but Fourie et al. (2013) hypothesized that the incongruence observed among gene trees inferred from mitochondrial genes could have resulted from recombination between the mt genomes of non-conspecific species.

In this study we considered the hypothesis that introgres-sion or introgressive hybridization occurred in the history of the FFSC. Our first aim was to identify and characterize regions in the mt genomes of extant FFSC species that potentially originate from such interspecies gene flow events. For this purpose, the mt genomes for two FFSC species (i.e. F. mangiferae and F. sterilihyphosum) were determined and used to complement those (F. circinatum, F. verticillioides and F. fujikuroi) already in the public domain (Al-Reedy et al. 2012, Fourie et al. 2013). These genomes were then subjected to the recombination detection method (Martin & Rybicki 2000), Bootscan (Martin et al. 2005), Geneconv (Padidam et al. 1999) and Maximum X2 (Smith 1992) analyses that were designed for detecting interspecific recombination (Martin et al. 2010). The second aim of this study was to utilize a coalescent-based approach for detecting ancestral recombination in the mt genes of extant FFSC species (Price & Carbone 2005). We purposefully did not employ phylogenetic methods given the low sequence diversity observed in the mt genes of the FFSC and other fungi (e.g. Seifert et al. 2007, Huang et al. 2008, Fourie et al. 2013). For these coalescent analyses, the sequences for five mt genes (atp6, cox2, nad3, nad5, and nad6), previously shown to support incongruent phylogenetic histories (Fourie et al. 2013), and from a collection of species spanning the diversity of the FFSC, were utilized. To assess the potential effects of false negatives and/or systematic errors (i.e. artefacts that arise from failure to fully account for the properties of these data) (Delsuc et al. 2005) in the analyses, the degree to which selection and substitution rate heterogeneity affected the individual mt gene datasets were also evaluated.

Material and Methods

Fungal isolates

Twenty-seven Fusarium isolates representing three, four and five species in the respective “African”, “Asian” and “American” clades of the FFSC (O’Donnell et al. 1998, 2000b), were used (Table 1). This collection included the standard mating type tester strains for the nine mating populations (i.e. MP-A to MP-I) or biological species of the FFSC (Leslie & Summerell 2006, Kvas et al. 2009), as well as representatives of F mangiferae and F. sterilihyphosum.

Table 1 Isolate information for the Fusarium fujikuroi complex species used in this study.

Mt genome sequencing and assembly

To determine the mt genome sequence for F. temperatum isolate CMWF 389 and F. mangiferae isolate CMWF 1214 (Table 1), total genomic DNA was extracted as described previously (Groenewald et al. 2006) and subjected to pyrosequencing at Inqaba Biotechnologies (Pretoria, South Africa) on a single lane using the GS-FLX platform (Roche 454 system, Life Sciences, CT). After exclusion of low quality reads, those encoding mt sequences were identified using BLAST comparison to the available FFSC mt genomes (Al-Reedy et al. 2012, Fourie et al. 2013). The mt reads for the two species were subsequently assembled de novo with the CLC Genomics Workbench software version 6.0 (CLC bio, Århus, Denmark). The order and orientation of contigs were determined using F. circinatum, F. verticillioides, and F. fujikuroi as reference genomes. Gaps between contigs were filled manually by Sanger sequencing. Protein coding and tRNA mt genes were identified with MFANNOT and RNAweasel (https://doi.org/megasun.bch.umontreal.ca) (Lang et al. 2007), as well as tRNAscan-SE (Lowe & Eddy 1997). Gene identities were confirmed with BLASTp comparisons against NCBI.

Mt genome-based recombination analysis (RDP, Bootscan, Geneconv and Maximum X2)

The Recombination Detection Program (RDP) package version 3.44 (Martin et al. 2010) was used to screen for possible recombination events in the five FFSC mt genomes and the individual gene datasets using RDP (Martin & Rybicki 2000), Bootscan (Martin et al. 2005), Geneconv (Padidam et al. 1999) and Maximum X2 (Smith 1992). Since these tools differ with regards to their power to detect recombination (Wiuf et al. 2001, Posada 2002), results from all four recombination detection methods were compared and only recombination identified by all four methods were considered. For these analyses (see below), the five mt genomes and the individual gene datasets were aligned using the CLC Genomics Workbench software.

The RDP method identifies potential recombinant segments by plotting the pair-wise percentage identity values of all combinations of three sequences/isolates within the given dataset. A potential recombinant region is subsequently identified as the region where the pair-wise percentage identity of sequence A to C or B to C is higher than that of A to B given that A and B are more closely related to one another than to C. The probability that the potential recombinant occurred by chance is then approximated using the binomial distribution (Martin & Rybicki 2000). Bootscan identifies potential recombination segments by constructing pair-wise distances and bootstrap replicates within overlapping sequence blocks. High degrees of bootstrap support for different tree topologies suggest potential recombinant regions (Martin et al. 2005). Geneconv detects recombination by identifying aligned sequenced pairs, where a match between two sequences is given +1 and a mismatch is awarded a penalty -m. The mismatch penalty depends on the density (ratio) of polymorphic sites between the sequences and the mismatch intensity parameter (G-scale), which is proportional to the total number of site differences (i.e. polymorphic sites) between the two isolates (Padidam et al. 1999). The Maximum X2 test searches for recombination break points by comparing the number of segregating sites on both sides of a putative recombination break point and calculating 2 × 2 X2 values as an expression of the difference on either side of the central partition (Smith 1992). The p-value was set to 0.05 for all methods employed.

Mt gene sequencing

Total genomic DNA was extracted from week-old cultures (Table 1) incubated on half strength potato dextrose agar (PDA; Biolab Diagnostics, Wadeville, South Africa) at 25 °C. For amplification of mt genes, Primer3 (https://doi.org/primer3.sourceforge.net/) was used to design primers that target nad3, nad5 and nad6 encoding the respective nicotinamide adenine dinucleotide (NADH) dehydrogenase subunits, apt6 that encodes adenosine triphosphate (ATP) synthase subunit 6, and cox2 that encodes cytochrome c oxidase subunit II (Supplementary Table S1). PCR reaction mixtures were adjusted to 25µl with sterile distilled water and contained ca. 5 ng/ml DNA, 0.5 mM of each primer, 250 mM dNTPs (Fermentas, Nunningen, Switzerland), 0.04 U/ml Taq DNA polymerase (Roche Molecular Biochemicals, Manheim, Germany) and PCR buffer with MgCl2 (Roche). PCR cycling conditions consisted of an initial denaturation step at 94 °C for 5 min, followed by 35 cycles at 94 °C for 35 s, each primer pair specific annealing temperature (Supplementary Table S1) for 35 s, and 72 °C for 90 s with a final extension step at 72 °C for 5 min. Amplification products were precipitated and purified with polyethylene glycol (Hartley & Bowen 2003) and sequenced in both directions using the original primers, the BigDye® terminator v3.1 cycle sequencing kit (Applied Biosystems, Foster City, CA) and anABI PRISM®377 DNA sequencer (Applied Biosystems).

Coalescent-based detection of recombination in mt genes

Individual sequence alignments were collapsed into binary matrices by excluding segregating sites and indels using SNAP MAP (Aylor et al. 2006). This was done in order to assume the infinite-sites model of mutation where at most one mutation event can occur at each site (Kimura 1969). The minimum number of recombination events (Rm) within each binary matrix (gene dataset) was determined using RECMIN (Myers & Griffiths 2002) in SNAP Workbench (Price & Carbone 2005). Rm is based on the four-gamete test of Hudson & Kaplan (1985) that infers recombination between pairs of loci at which all four possible gametic types are present. Finally, minimal ancestral recombination graphs (ARG) were reconstructed using the BEAGLE branch and bound algorithm (Lyngsø et al. 2005) in SNAP Workbench.

The sequence data for whole mt genomes were not used in these coalescent-based analyses. This is because the high sequence diversity of the intergenic and/or intron regions (Al-Reedy et al. 2012, Fourie et al. 2013) would increase the false positive recombination events detected under the infinitesite model of mutation (McVean et al. 2002). Conversely, the individual mt gene datasets were not subjected to the analytical tools included in RDP3. This is because of these tools have limited value for detecting recombination in highly conserved regions (Posada 2002, Tsaousis et al. 2005) such as the five mt gene datasets examined here (see below).

Evaluating possible sources of systematic error and/or false positives

The ability to detect recombination in DNA sequences depends on the genetic diversity of the data as well as among site rate variation (Posada 2002). Little genetic diversity within the dataset could obscure the signal for recombination whereas rate variation could allow recombination to be detected incorrectly. Nucleotide diversity, sequence divergence and rate heterogeneity of each of the individual mt gene datasets were, therefore, estimated. For each dataset, DNAsp ver. 5 (Librado & Rozas 2009) was used to determine π, which is the average number of nucleotide differences per site between two sequences (Nei 1987). This software package was also used to determine the sequence divergence estimates Dxy and Da, which respectively are the average and net numbers of nucleotide substitutions per site between species (Nei 1987). Dxy and Da were used to estimate divergence between species within the FFSC where F. oxysporum was used as the outgroup (Cunnington 2007, Pantou et al. 2008). For comparative purposes, π Dxy and Davalues were converted to percentages. jModeltest was used to evaluate the pattern of among-site rate heterogeneity for all gene datasets by estimating the shape parameter (α) of the gamma distribution, where smaller α values indicate strong rate variation (Yang 1996, Posada 2008).

Signals of recombination might also be obscured by other evolutionary phenomena such as directional selection acting on the target genes and/or analytical artefacts arising from factors such as substitution saturation. Substitution saturation results in homoplasy (Rubinoff & Holland 2005), which can incorrectly point towards recombination because recurrent mutations (i.e. mutation hot spots) and recombination can generate similar patterns of genetic variability (Eyre-Walker et al. 1999, Hagelberg 2003, Galtier et al. 2006). In addition, recurrent mutation could also result from selection pressure acting on the target genes or selection pressures acting on specific regions of the target genes (Nielsen 2005, Reed & Tishkoff 2006). We, therefore, tested if positive selection acted on the mt gene datasets and determined the level of substitution saturation in the mt gene datasets.

Specific sites under positive or negative selection were identified using three codon based maximum likelihood methods. These included the Fixed Effect Likelihood (FEL), Random Effect Likelihood (REL) and Single Likelihood Ancestor Counting (SLAC) methods from Datamonkey (https://doi.org/www.datamonkey.org/) developed by Kosakovsky et al. (2005). FEL estimates ω for each site in a sequence alignment. REL allows rate variation in both non-synonymous and synonymous rates and a general underlying nucleotide substitution model. SLAC reconstructs ancestral sequences using the joint likelihood reconstruction method in the codonstate space (Kosakovsky et al. 2005). Results arising from all methods were compared and only codons identified as being under selection by all methods were considered.

The level of substitution saturation was measured by calculating the information entropy-based index of substitution saturation (Xia et al. 2003) with DAMBE5 (Xia 2013). This is a tree-based approach where substitution saturation (ISS) can be determined by testing if the observed entropy at site i is significantly smaller than the expected entropy under full substitution saturation. We compared ISS to the critical ISS value, ISS.C, where the latter depends on the topology of the tree, the number of taxa, the sequence length, the nucleotide frequency, and the transition/transversion ratio, all of which are studied and compared through simulations of an experimental set of topologies given the alignment (Xia et al. 2003, Xia & Lemey 2009). Since the third codon is more variable due to the wobble effect of the genetic code (Spencer & Barral 2012) and thus likely to experience more substitutions, substitution saturation was determined for the first and second codons separately from the third codon of each gene dataset.

Results

Mt genome sequencing and assembly

Pyrosequencing together with Sanger sequencing allowed for the assembly of the mt genomes of Fusarium temperatum and F. mangiferae. In both cases, the sequences spanned the entire replicon, except for a gap containing the large subunit ribosomal RNA gene and three of the clusters of tRNA genes, i.e. tRNA gene clusters 2, 3 and 4 (Fourie et al. 2013). None of the pyrosequencing reads mapped to the corresponding region in the mt genomes of F. circinatum, F. fujikuroi, and F verticillioides, in which this sequence has been determined. Also, the repetitive nature of these regions (Fourie et al. 2013) precluded their amplification and sequencing, despite various attempts using multiple primerpairs.

The mt genome sequences of F. temperatum (GenBank KP 742837) and F. mangiferae (GenBank KP 742838) contained the 14 known mt protein coding genes (Fig. 1), the products of which are involved in oxidative phosphorylation. Within these protein coding genes, 12 and four group 1 introns were, respectively, found in the two mt genomes. Fusarium temperatum and F. mangiferae both contained an intron in their cob gene, while the F. temperatum cox1 gene contained 8 introns and the cox3, nad1 and nad2 genes each contained one intron, as opposed to the three introns in cox1of F. mangiferae as well as cox3 and nad2 that were free of introns (Table 2).

Fig. 1
figure 1

Annotated map for the mitochondrial fragment of Fusarium temperatum (KP742837). The genome fragments encode the 14 protein coding genes of the oxidative phosphorylation pathway (blue = entire gene; yellow = coding sequence), one rRNA (red), 5tRNA (red) and tRNA cluster 1.

Table 2 The number of introns identified in the mitochondrial protein coding genes of the Fusarium species used in this study.

Although the presence of introns within protein coding genes varied greatly among and within the species examined, comparison of the five mt genomes suggested a possible link between intron abundance and the FFSC clade of the species (Table 2). For example, the mt genomes of the “American” clade species F. circinatum and F. temperatum both contained 14 introns as opposed to the one and four introns found in the mt genes of F. fujikuroi and F. mangiferae, respectively, that reside in the “Asian” clade and the three introns in the mt genes of the “African” clade species F verticillioides. Regarding tRNA genes outside clusters 2, 3, and 4 (fragment not sequenced), the mt genome sequences of both F. temperatum and F. mangiferae contained tRNA cluster 1 which encodes four tRNA genes, as well as nine individual tRNA genes (Fig. 1). In both assemblies, all protein coding and sequenced tRNA genes were located in the same gene order and orientation (Fig. 1), similar to what has been described for other FFSC mt genomes (Al-Reedy et al. 2012, Fourie et al. 2013).

Mt genome-based recombination analysis

All four of the recombination detection tools identified recombinant regions within the FFSC genomes examined in this study (Supplementary Table S2). The consensus of the four detection methods suggested five significant recombinant regions in both gene and intergenic regions (Table 3). For example, recombinant regions were detected within the intergenic region between the tRNA gene for cysteine and the cox1 gene, as well as between cob and the tRNA gene for arginine. Recombination was also detected within the atp6, atp9 and cox2 genes. For all five of the detected recombination events, RDP, Bootscan, Geneconv, and Maximum X2 suggested events in which F. circinatum, F. mangiferae or F. temperatum were predicted to be the daughter of the recombination event, although the major and minor parents could not always be identified (Table 3).

Table 3 The recombinant regions detected by RDP, Geneconv, Bootscan and Maximum X2 of the Fusarium mitochondrial genomes.

Coalescent-based detection of recombination in mt genes

Within the five mt gene sequence datasets examined for the 27 Fusarium isolates included in this study, 17 recombination events were detected using RECMIN. The estimated minimum number of recombination events needed to explain the incompatibilities in the individual datasets were 0, 0, 1, 3, 3, and 10 for the nad3, cox2, atp6, nad6 and nad5 datasets, respectively. This suggested that recombination occurred in all the datasets examined with the exception of nad3 and cox2, and that recombination was extensive in nad5.

SNAP MAP collapsed the 27 sequences for each mt dataset into their respective haplotypes. There were six haplotypes in each of the nad3 and cox2 datasets, 10 in each of the nad6 and atp6 datasets and 15 in the nad5 dataset (Supplementary Table S3). Each haplotype typically comprised of the representatives for each species, or the representatives of closely related/sister species combined into a single haplotype (Supplementary Table S3).

Consistent with that suggested by the minimum number of recombination events, the ARG analysis suggested extensive recombination in the sequences of nad5, with some recombination in the sequences of atp6 and nad6 and no recombination in the sequences of nad3 and cox2 (Fig. 2). The ARG analysis also allowed identification of the recombinant region within each mt gene dataset. For example, it identified a recombination region within the atp6 dataset at nucleotide position 65 (Fig. 2, Supplementary Table S4).

Fig. 2
figure 2

Minimum ancestral recombination graphs (ARGs) inferred from the FFSC datasets for the atp6, nad5 and nad6 using the BEAGLE branch and bound algorithm in SNAP Workbench 2.0 (Price & Carbone 2005). Haplotypes are colour coded according to the clade within the FFSC complex to which the species belong (“African” = red, “American” = blue, “Asian” = green). Circles with numbers represent recombination events and the number within a circle represents the nucleotide position of the recombination event for each dataset. After a recombination event, the two sequences are replaced with a recombinant consisting of a prefix (P) from one sequence and a suffix (S) of the other sequence. The numbers on the branches suggest the number of mutation events before the coalescence of the specific haplotypes.

The ARGs were also used to determine the relative order in which recombination occurred and to evaluate the contribution of mutations and coalescent events. Overall, the “American” clade haplotypes were associated with the deepest recombination events in the ARGs inferred from nad5 (position 1629 and position 1026) and nad6 (position 332) datasets. In addition, the ARGs of atp6, nad6 and nad5 suggested that recombination events had also occurred more recently in the evolutionary history of the FFSC based on their emergence towards the tips of the ARGs, for example, recombinant position 65 of the atp6 dataset and/or positions 503 and 515 of nad6 dataset. Finally, the ARGs of nad6 and nad5 also suggested that recombination occurred between the clades of the FFSC. For example, recombination event position 1629 of the nad5 dataset (haplotype H4) resulted from ancestral individuals that became haplotype H5 and H6 and therefore haplotypes that represent both the “African” and “Asian” clades (Fig. 2).

Evaluating possible sources of systematic error and/or false positives

To evaluate the effect of systematic error on our coalescent-based analyses, various additional parameters were estimated and examined. This is because failure to appropriately account for the complex properties of the individual gene datasets could lead to false detection or non-detection of ancestral recombination events. In other words, these analyses provided an indication of the robustness of the conclusions drawn from the RECMIN and ARG results. Indeed, based on our analyses, neither nucleotide diversity nor substitution saturation (homoplasy) appeared to represent significant sources of such systematic errors. The average nucleotide diversity (π) estimated for all of the mt datasets (see Table 1 for EMBL nucleotide sequence database numbers) was low and ranged from 0.4–1 % (Table 4). For these datasets, the sequence divergence estimates Dxy and Da between FFSC and F. oxysporum were also low and ranged between 0.9–2 %, and 0.3–1.5 %, respectively (Table 4). In terms of substitution saturation, the observed entropy ISS was compared to ISS.C. For the five gene datasets used in this study, as well as for the datasets respectively containing first plus second codon position and third codon positions only, the ISS values were significantly smaller than the ISS.C values (Table 5), which suggested negligible substitution saturation.

Table 4 Gene diversity, divergence and rate heterogeneity of the mitochondrial genes (nad3, nad5, nad6, atp6 and cox2) examined in this study.
Table 5 Index of substitution saturation as well as the critical value of the index of substitution saturation used to measure the level of substitution saturation within the FFSC mitochondrial genes (nad3, nad5, nad6, atp6, and cox2).

Selection analyses with FEL, REL and SLAC suggested that no codons were under positive selection. In contrast, the FEL, REL and SLAC analyses identified a number of codons under negative/purifying selection for all the datasets included (Supplementary Table S4). For example, 13, 10, 6, 31 and 20 codons of the atp6, cox2, nad3, nad5, and nad6 gene datasets, respectively, were identified to be under negative selection by one and/or two of the methods used. The consensus results, however, suggested that only nad5 and nad6, respectively, had three [(codon 139; methionine), (287; phenylalanine), (356; leucine)] and one (codon 116; serine) codon under negative selection (significant p-values FEL and SLAC = 0.1; REL baysian factor = 50). However, none of the consensus codons in the nad5 and nad6 genes were shown to be subjected to negative selection overlapped with the recombination events suggested from the ARGs.

The only potential source of systematic error in our coalescent-based analyses was among site rate heterogeneity. The gamma distribution shape parameter varied among the different mt gene datasets (Table 4), but was particularly low for the atp6 and cox2 datasets (i.e. α-values of 0.01 as opposed to 3 0.3 for the other three datasets). This indicated strong substitution rate variation among sites that could lead to the detection of false positive recombination events. The RECMIN and ARG results did not predict any recombination events in the cox2 dataset, but the single recombination event predicted in the atp6 dataset likely represents an analytical artefact (Posada 2002).

Discussion

This aim of this study was to find evidence of heteroplasmy-associated recombination between mt genomes of species in the FFSC. Both direct and coalescent-based method allowed for the detection of heteroplasmy-associated recombination, which support previous suggestions that gene flow or introgressive hybridization occurred in the history of the FFSC (O’Donnel & Cigelnik 1997, Fourie et al. 2013). In addition, the detection of recombination by both methods also provided evidence that introgressive hybridization occurred at ancient and more recent time scales.

Both of the approaches used in this study allowed for the identification of interspecies recombination events in the ancestry of the FFSC. The methods implemented in the RDP3 package infer recombination events directly from the sequence information provided (Martin et al. 2010), while the ARGs provide information on the order in which recombination and mutation occurred over evolutionary time (Lyngsø et al. 2005). In other words, ARGs represent statistical descriptions of the genealogical history of each mt gene sequences backwards in time to the most recent common ancestor (Griffiths 1999, McVean et al. 2002, Lyngsø et al. 2005). Although examples in fungi are limited, RDP3’s direct recombination detection methods detected recombination and hybridization between free-ranging Australian lizards (Ujvari et al. 2007), between scorpion species in the family Buthidae (Gantenbein et al. 2005) and between divergent populations of the nematode species Globodera pallida (Hoolahan et al. 2012), while ARG analysis showed introgression or hybridization in organisms such hydrothermal vent mussels (Faure et al. 2009).

We performed extensive analyses to ensure that the putative ancestral recombination events detected in this study were not due to a failure to account for the inherent evolutionary complexity of the data (Possada 2002, Delsuc et al. 2005). Despite nucleotide substitution saturation possibly being common in mt genomes (Spencer & Barral 2012, Gaillardin et al. 2012), little evidence for it was found as was expected since the FFSC mt datasets were highly conserved. However, as previously described (Dowling et al. 2008, Rand 2001, Stewart et al. 2008, Soares et al. 2009), the various FFSC mt gene datasets contained evidence of selection at specific codons. But we only detected purifying selection (codons under positive or diversifying selection were not detected) and none of the affected consensus sites (i.e. identified by all methods) occurred in the recombinant regions identified. Also as expected (Excoffier & Yang 1999, Ingman et al. 2000), strong among-site rate variation was detected in some of the mt genes examined, which may be linked to the function/structure of their products (Yang 1996). Although the infinite-sites model of substitution (Kimura 1969) utilized in the ARG analyses likely excluded the effects of this phenomenon, only one putative recombination event was detected in a dataset associated with strong among-site rate variation. Taken together, these results thus indicated that the putative recombination events identified do not represent analytical artefacts, because the sources of systematic error in the various FFSC mt gene datasets were limited.

One concern that could not be fully eliminated in the current study is that the DNA sequence signatures of introgression or hybridization are not readily distinguishable from those of incomplete lineage sorting or deep coalescence (Maddison 1997, Degnan & Rosenberg 2009). Incomplete lineage sorting typically manifests as polymorphisms that persist through several speciation events; i.e. divergence and drift-associated random sorting of an ancestral polymorphism did not lead to its differential fixation in the resulting species (Maddison 1997). In ancestral recombination analyses, incompletely sorted polymorphisms would thus “behave” in a similar manner to those originating from interspecies gene flow (Degnan & Rosenberg 2009). Although we could not rule out its involvement in our analyses, incomplete lineage sorting is unlikely to have affected all of the recombinant sites/regions identified. This is especially true for the long stretches of recombinant sequences (199–1668 bp) detected among the genomes of the “Asian” and “American” clade species included (see Table 3). Future studies should however investigate the role of incomplete lineage sorting in the evolution of FFSC by employing statistical approaches to distinguish gene flow and incomplete lineage sorting based on whole genome sequence data (Joly et al. 2009).

Mechanisms that would allow for recombination between mt genomes of different FFSC species is unknown. It is currently hypothesized that recombination between different mt genomes can occur via the dispersed repeat elements they harbour, exchange between highly conserved regions or via intron homing (Basse 2010, Galtier 2011). Recombination between dispersed repeat elements is common among plant mitochondria in which the repeats serve as crossover points for homologous recombination (Palmer & Herbon 1988). Recombination via intron homing occurs when LAGLIDADG or GIY-YIG endonucleases that are encoded in fungal mito-chondrial introns move into previously intron-less genes (Goddard & Burt 1999, Haugen et al. 2005, Stoddard 2006). Given the overall gene order conservation of the intron-richmt genomes of the FFSC described here and previously (Al-Reedy et al. 2012, Fourie et al. 2013), recombination via intron homing and/or exchange between conserved regions is potentially more likely as recombination via dispersed repeats would allow for gene order rearrangements. In general, however, mt recombination in fungi is expected to employ mechanisms that are markedly different from those inferred for animals where mt genomes typically lack introns and dispersed repetitive elements (Rokas et al. 2003, Piganeau et al. 2004).

Overall, the findings presented here indicate that interspecies or heteroplasmy-associated gene flow and recombination occurred at both ancient and recent timescales during the evolution of the FFSC. The results of the ARG analyses presented here (especially the nad5 and nad6 ARGs) provide evidence for older and/or ancient recombination within the FFSC. It is conceivable that such recombination and subsequent introgression events could have occurred in the Miocene, prior to biogeographic separation of the clades (O’Donnell et al. 2013), during which diversification of the complex coincided with the radiation of grasses and eudicots that use C4 photosynthesis to fix carbon (Edwards et al. 2010, Christin et al. 2011, O’Donnell et al. 2013). During this time period, environmental conditions likely influenced the distribution of the ancestral FFSC members, thus providing the opportunity for introgressive hybridization to occur (Olson & Stenlid 2002, Schardl & Craven 2003).

Recombination events that occurred at more recent timescales were revealed by the tools implemented in RDP3 (Martin et al. 2010). These all detect recombination by identifying regions of sequence similarity between individuals that are unusually high in comparison to the overall sequence similarity of these individuals as estimated from the entire region and/or genome in question. However, post-recombination mutations (i.e. those that accumulate over evolutionary time after the original interspecies gene flow event) would obscure the distinction of these recombinant regions from other background mutations (Posada 2002). Accordingly, recombinant regions detected by these methods most likely represent sites at which the signatures of recombination have not yet been eroded away by normal mutational processes. The five significant regions of recombination detected here (see Table 3), were thus the result of interspecies gene flow events that occurred relatively recently in the history of the FFSC, although information regarding the geographic contact points of these events are lacking. Also, many FSSC species are inter-fertile under laboratory conditions (Desjardins et al. 2000, Leslie et al. 2004a) and a natural hybrid has been described from native tall-grass prairie south of Manhattan in Kansas (Leslie et al. 2004b, 2007). Overall the results of this study showed that interspecies gene flow and introgressive hybridization have played an important role in the evolution of the FFSC and will likely continue to do so. However, the extent to which these phenomena would influence the evolution of the complex and at what point new species will emerge remains to be determined.