Recombination provides evidence for ancient hybridisation in the Silene aegyptiaca (Caryophyllaceae) complex

Recombination events among distinct alleles complicate phylogenetic estimation. Various in vivo and in vitro processes can bring distinct alleles into the same genome to then undergo recombination, which may subsequently mislead phylogenetic inference if not assessed properly. Among the processes bringing divergent alleles together, hybridisation is perhaps the simplest and most likely, but alternatives need to be considered before hybridisation can be accepted as the underlying cause. Such alternatives include the presence of paralogues or deeply coalescing alleles, as well as amplification artefacts. Here, we document a recombination event that apparently took place between two divergent lineages of the Silene aegyptiaca complex in the flowering plant family Caryophyllaceae. We evaluate several possible mechanisms that might be responsible for the observed pattern. An ancient introgressive hybridisation event was the simplest explanation for the observations, compatible with geographic proximity of the affected lineages, whereas paralogy and deep coalescence are difficult to reconcile with the evidence obtained from a species tree of the group based on six different, non-recombinant genes and gene trees inferred using two partitions of the recombinant locus.


Introduction
Recombination among distinct allele lineages can pose a challenge for phylogenetic inference of gene trees (Sanderson and Doyle 1992;Posada and Crandall 2002). In vivo recombination occurs during meiosis when a chromatid of each homologous chromosome crosses over and is exchanged, thus potentially forming chimaeric alleles. The presence of chimaeric alleles has the potential to introduce additional homoplasy into a dataset if non-monophyletic alleles contributed to the chimaera (Sanderson and Doyle 1992). The result can be the disruption of the historical signal, which in turn can lead to incorrect gene tree inference (Sanderson and Doyle 1992;Posada and Crandall 2002), biased branch length (Penny et al. 2008) and node age (Pfeil 2009) estimation.
Although recombination can be seen simply as a confounding process, it can also be an opportunity to detect the footprint of ancient hybridisation (Wendel and Doyle 1998;Linder and Rieseberg 2004;Rautenberg et al. 2008;Kelly et al. 2010). Hybridisation has the potential to bring together divergent alleles into the same genome, where they can then be recombined. If a recombined allele becomes fixed and persists to be sampled, it may be possible to infer which lineages contributed to this allele, for example, by separate phylogenetic analysis of non-recombined parts of the locus (Rautenberg et al. 2008;Kelly et al. 2010;Schwartz et al. 2015) combined with multi-labelling approaches (Visser et al. 2012). However, competing explanations for the origins of chimaeric alleles need to be considered before hybridisation can be inferred (e.g. Rautenberg et al. 2008;Frajman et al. 2009).

Bernard E. Pfeil and Zeynep Toprak are equal contributors.
Electronic supplementary material The online version of this article (doi:10.1007/s13127-017-0331-9) contains supplementary material, which is available to authorized users.
One alternative explanation is that deeply coalescing allele lineages have contributed to the formation of a chimaera. Coalescent theory predicts that allele lineages that differ in selectively neutral ways can be maintained for some time after speciation in descendent species, depending on the effective population size of the locus (e.g. Rosenberg 2003). If the allele lineages persist through a second speciation event, with one allele lineage being passed to both descendent species, but the other surviving only in one descendent (the deeply coalescing allele lineage), then recombination between the surviving lineages in the one species carrying both lineages can create chimaeric alleles. One part of such a chimaeric allele may indicate a closer relationship of the chimaera to alleles from the closely related species (the non-deeply coalescing lineage), but the other part could present a different relationship, as deeply coalescing alleles can coalesce with allele lineages from different species when polymorphism occurs in a common ancestral species (Rosenberg 2003). Thus, different topologies may occur in the separate trees of recombined portions of alleles, even though hybridisation has not occurred.
Another explanation is that alleles residing at paralogous loci may occasionally pair during meiosis (e.g. through unequal recombination among proximal copies) and also contribute to the formation of recombined alleles (Jelesko et al. 2004;Yandeau-Nelson et al. 2006). If the sampling of alleles at each locus is incomplete, or some descendent loci have been lost, the origins of the chimaeric allele may mimic the topological expectations of deep coalescence.
A third alternative explanation is recombination during the data acquisition steps, such as might be caused by template switching during PCR amplification (Cronn et al. 2002). If paralogous loci reside in the genome, or contaminating DNA from a related species is present, suitable templates for template switching could be available to generate chimaeric alleles. In the case of contamination, the recombined alleles would be expected to be identical (or nearly identical, within the sequencing error margin) to different alleles in the dataset at different parts of the sequence (e.g. Pfeil et al. 2004). This could mimic the pattern expected for recombination that followed a very recent hybridisation event where the introgressed alleles have not diverged from the originating alleles.
In an initial survey of seven loci selected to study species delimitation in the Silene aegyptiaca complex (below), we found indications of recombination in one of the low copy nuclear (LCN) loci. This locus was of necessity excluded from the earlier study (Toprak et al. 2016), because it violates the assumptions of the methods employed there (Heled and Drummond 2010). Here we examine the sequence patterns within this locus further and test (1) whether the evidence for recombination in the LCN locus is compelling, (2) how many recombination events might have taken place, (3) which lineages have contributed to the formation of chimaeric alleles and (4) which alternative explanations can be ruled out in order to infer the origin(s) of the chimaeric sequences. We also provide a tentative evaluation of which modes of speciation are more compatible with the cause of recombination and discuss the implications this has with respect to the porous nature of the barriers to gene flow between these species in the S. aegyptiaca complex.
The S. aegyptiaca (Caryophyllaceae) complex The S. aegyptiaca (L.) L. fil. 1782 species complex includes a number of highly similar annual species that are found in the Eastern Mediterranean Region, especially southern Anatolia (Coode and Cullen 1967;Erixon and Oxelman 2008;Aydin et al. 2014a). A recent study based on six nuclear loci revealed several cryptic species in the group (Toprak et al. 2016). The analysis was based on a Bayesian multispecies coalescent analysis that tests the assignment of samples to species while co-estimating the species tree (DISSECT; Jones et al. 2015). A conservative interpretation of the results suggested that the observed species diversity was partitioned into three main clades, containing a total of nine putative taxonomic species (colour coded) that are partly geographically structured ( Fig.  1; modified from Toprak et al. 2016). The evidence for these nine species, exhibited by monophyletic gene trees, is strong. If samples come from the same species, then recombination among them is expected for sexually reproducing organisms. However, recombination among species can provide evidence for rare unexpected gene flow.

Material and methods
Data generation DNA sequences of the LCN locus EST09 were generated from dried material of members of the S. aegyptiaca group (see Petri et al. 2013, for details on how the locus was discovered, assessed to be low copy and PCR primers designed). Protocols of molecular methods including DNA extraction, PCR amplification, sequencing and multiple alignment can be found in Aydin et al. (2014b), and vouchers are presented in Toprak et al. (2016). For the cases where the initial sequences were polymorphic, sequence-specific primers were used to separate allele copies (Scheen et al. 2012).

Data analysis
The alignment was tested for recombination using RDP4 (Martin et al. 2010). An initial p value of 0.1 was used with three detection methods (RDP, MaxChi and Chimaera) to screen for recombination events at a relaxed stringency (high type 1 error). Any putative recombination events found by any of these methods were then re-checked with all methods. Only events with p ≤ 0.05 in at least two methods, and with corresponding phylogenetic evidence consistent with recombination, were considered to be real. Although several events were inferred within species and excluded from further analyses (not shown), we focused only on the one event that we report here that is a case of recombination between species.
We checked the phylogenetic pattern of the putative recombination event first within RDP4, using UPGMA trees (not shown), but then by running a Bayesian analysis (using MrBayes v3.2; Ronquist and Huelsenbeck 2003) run for 500,000 generations in two parallel runs with two chains each (sampling every 1000 generations), with mixed substitution models and the addition of a gamma parameter (otherwise default settings). The alignment was partitioned into two parts, those either side of the recombination breakpoint estimated by RDP4 (positions 1-813 and 814-1573) and run separately.
We then ran the putatively recombined part of the alignment (positions 814-1573) in BEAST v1.8.0 (Drummond et al. 2012) for 10,000,000 generations (sampling every 5000 generations), using an uncorrelated log-normal relaxed clock model, to derive the relative age of the recombination event with respect to speciation events. We calibrated the gene tree using the 95% HPD (highest posterior density) interval of the species tree root height from the previous DISSECT analysis (Toprak et al. 2016) as the 95% central interval of a normal distribution as the prior on the gene tree root height.
We used the HKY substitution model, which was the most visited in the MrBayes mixed model analysis, with the addition of a gamma among-site rate heterogeneity parameter, as in the previous analyses. We changed the relaxed clock mean prior (ucld.mean) to a broad uniform prior (1 × 10 −6 to 1 × 10 2 ) and subsequently checked that these limits were not encountered as a boundary by inspection in Tracer after the analysis.
We did not calibrate the BEAST tree in absolute time units, because the evidence for a specific calibration is scanty (e.g. fossil record, mutation rate). For the purpose of the present study, absolute ages are not needed, as we are just interested in whether the recombination event took place before or after a speciation event. Therefore, we use relative time units (scaled as substitutions per site) and a prior density given by a species tree based on six other loci.

Recombination results
We found one recombination event using RDP4 that was supported by four methods with the following p values: GeneConv p = 0.02; BootScan p = 0.01; MaxChi p = 0.03; Chimaera p < 0.01. Three sequences belonging to the individuals 15,247, 3290 and 15,363 (all from the S. aegyptiaca  Fig. 1) shared this event. The recombination breakpoint was suggested to be around position 813.
The MrBayes analyses of the 5′ partition of the alignment (1-813) converged quickly and produced a consensus tree with two of these three sequences in a supported clade with a member of the same group based on the DISSECT species tree, individual 15,223 (Fig. 2). All three sequences are members of a large clade consisting of individuals from distinct lineages (yellow, green, blue and lilac) of S. aegyptiaca complex, with little supported resolution within it but clearly distinct from the S. atocioides Boiss. (pink) group sequences (posterior probability [PP] = 1.0; the colour coding for species is shown in Fig. 1). The light pink group sequences could be sister to the S. atocioides (pink) group, if one unresolved node is favourably resolved (to shift the clade with individuals 15,350 and 15,146 to be sister to the S. atocioides (pink) group, thus making this partition of the gene somewhat compatible with the species tree in Fig. 1.
Phylogenies inferred using the 3′ partition (814-1573), however, produced a tree with the putative recombinant sequences instead in a single clade and strongly supported as sister to the clade of the S. atocioides (pink) group (Fig. 3). This result is strongly in conflict with the species tree inferred by DISSECT (Fig. 1), *BEAST (shown in Toprak et al. 2016) and the 5′ partition of the same locus (Fig. 2). Three strongly supported branches (each PP = 1.0) exclude the placement of these three sequences with the sequence two of them are most closely related to in the 5′ partition (individual 15,223).

BEAST dating results
The dated tree of positions 814-1573 inferred by BEAST also converged quickly. The coefficient of variation did not abut zero (mean = 0.37), indicating that a relaxed clock was appropriate and that the rate variation was not excessive. The relative age of the recombination event (i.e. the divergence time of the recombinant clade from the sister group, 0.026) was approximately half the age of the species tree root height (0.049) and did not overlap with the latter age (95% HPD 0.016-0.036 and 0.039-0.060, respectively, Fig. 4).

Evidence for recombination
We found clear evidence for recombination supported by several methods. Phylogenetic analysis of the partitions from either side of the inferred breakpoint produced supported conflicting topologies consistent with the inferred recombination. These analyses implicate the S. atocioides (pink) group as the closest extant relative of the donor lineage of the 3′ partition of EST09. The six genes analysed previously (Toprak et al. 2016) place the individuals carrying the recombined EST09 sequences in a background consistent with other members of the S. aegyptiaca (yellow) group (Fig. 1). This suggests that a small proportion of the genome has successfully introgressed The monophyly of the three sequences in the 3′ part of EST09 suggests that a single successful introgression event took place, with subsequent divergence of allele lineages after the event. The lack of monophyly of these sequences in the 5′ part of the gene is compatible with this scenario, if the 3′ donor sequence was independently recombined at least twice (to explain the two supported different positions in Fig. 2) with different alleles present in the host species, S. aegyptiaca (yellow), after the introgression event (Fig. 5). Recombination hot spots are well known (Lichten and Goldman 1995;Hey 2004), so recombination in approximately the same position is not unexpected.

Origin of recombination via introgressive hybridisation
If our calibration is correctly placed, the donor lineage of the recombination S. atocioides group (pink) had already diverged from the remaining species, including the S. aegyptiaca (yellow) group, by the time the recombination occurred. The S. aegyptiaca (yellow) group, on the other hand, diverged from its sister taxon S. aegyptiaca subsp. ruderalis (lilac) (Fig. 1) at 0.017-0.031, which is largely overlapping with the time of recombination. However, the crown age of the S. aegyptiaca (yellow) group only barely overlaps with the recombination time (0.009-0.018). This suggests that the recombination event most likely took place in the stem of the S. aegyptiaca (yellow) group, presumably shortly after speciation, because no trace of the event has yet been found in the S. a. subsp. ruderalis (lilac) group. However, caution should be taken in this respect, because more extensive sampling within the S. a subsp. ruderalis (lilac) group may yet reveal recombinant sequences within that lineage.
The timing of the recombination event is consistent with ancient hybridisation that has brought together disparate alleles into the same genome and allowed meiotic recombination to occur (Fig. 5). In contrast, the timing we observe for the divergence between the position of the recombined sequences in the 5′ and 3′ parts of the gene is not consistent with paralogy or deeply coalescing alleles as the source. This is because the divergence between the source lineage and the recombinant partition is around the same age as the divergence between the yellow and lilac groups, rather than earlier (i.e. earlier than the speciation of all species here) as would be expected for a paralogue or a deeply coalescing allele drawn from the yellow group that has its closest relationship to the pink group).
To explain the source of the recombined alleles by either paralogous loci or deeply coalescing alleles requires several additional ad hoc assumptions. For paralogy, a gene duplication (one event), loss of copies to leave a single copy per individual (several events) and finally two unequal recombinations between the paralogous copies (two events) would be required to explain the observations. A similar number of events are also required for recombination involving a deeply  Fig. 3 Consensus phylogram inferred using MrBayes of the 3′ partition of EST09. Numbers above branches are the clade posterior probabilities. The root position was placed along the long branch that separated the S. atocioides (pink) group individuals and close relatives from the majority of other individuals, which is also consistent with mid-point rooting. The putatively recombined sequences are marked by red branches. The scale bar is in units of substitutions per site coalescing allele, but especially important is the maintenance of multiple allele lineages through several speciation events despite genetic drift. This is considered unlikely, because the relevant branches in the species tree ( Fig. 1) are also observed in most genes trees when independently analysed (not shown).
These explanations are less parsimonious than introgressive hybridisation, which requires the introgression itself (one event) and two ordinary meiotic recombinations (two events). The hybridisation is also compatible with the geographic proximity of the S. atocioides (pink) and S. aegyptiaca (yellow) groups, with the individuals bearing the recombined sequences found along the westernmost part of the distribution of the S. aegyptiaca (yellow) group, closest to the S. atocioides (pink) group individuals (Fig. 6).
We also rule out template switching during amplification (where the templates in question are derived from either the paralogy or deep coalescence scenarios above) as a viable explanation of the origin of the recombined sequences. This is for the simple reason that template switching during amplification requires the presence of the templates that the polymerase jumps between, in the same DNA extract (Pääbo et al. 1990). This means that both templates must have been in the genome to start with, which generates the same problems with paralogy and deeply coalescing alleles as before. If the ad hoc assumptions are considered too problematic to believe that these templates were in the same genome ready for recombination, then they are also not available for template switching. However, template switching involving distinct allele lineages brought into the population via hybridisation may have occurred. Whether the recombination occurred in vivo or during amplification does not affect our conclusion that hybridisation was the cause of the distinct sequences, but it does change when and where the recombination occurred.
Contamination with other samples in this study can also be ruled out, because the branch lengths of the recombined piece in the 3′ partition analysis shows many changes since recombination, which is clearly at odds with this explanation. Contamination should result in identical or near-identical sequences in the recombined part to the source of the contamination (which is most likely from the same study, given PCR primer specificities).

Implications of hybridisation on barriers to gene flow and mode of speciation
Our results imply rare hybridisation between lineages that have clearly diverged. This may offer some insight into their mode of speciation. Barrier formation can reduce or cut-off gene flow, simply due to isolation by distance, but generally only genetic drift acts to differentiate the now separated gene pools (Seehausen et al. 2014). Secondary contact can often be expected to result in some gene flow, limited by the amount of time since the barrier was formed.
In contrast, other modes of speciation not involving physical barriers to gene flow include chromosomal, ecological or ploidy-driven speciation. These are expected to result in intrinsic genetic isolation fairly early in the speciation continuum, such that secondary contact is unlikely to result in gene flow (Coyne and Orr 2004;Seehausen et al. 2014;Nosil et al. 2008;Servedio et al. 2011;Muir et al. 2012;Guirao-Rico et al. 2017). Given that hybridisation appears to be the cause of recombination (either in vivo or in vitro) in these Silene individuals, and that the species involved are distinct (Toprak et al. 2016), it appears that secondary contact has resulted in some but limited gene flow. The BS. atocioides^(pink) and BS. aegyptiaca^(yellow) clades occupy vicariant geographical areas, adding to the probability that speciation by geographical isolation is the most likely mode of speciation between the ancestors of these species.

Biogeographic implications of hybridisation across the Anatolian Diagonal
Anatolia is biogeographically interesting because it is a hotspot of biological diversity (Ansell et al. 2011) and because it stands as a bridge between Europe and the Near East, allowing a mixture of the biotas from these areas (Davis 1965(Davis -1985. Stretching across central-southern to northeastern Anatolia is the Anatolian Diagonal (AD), a chain of mountains known to be a barrier to biological exchange between the European and Near Eastern sides of the region (Davis 1965(Davis -1985Çiplak et al. 1993), although only for some taxa (Gündüz et al. 2007;Bilgin 2011).
In some taxa, the AD is thought to have separated eastern and western refugia during the last glacial maximum (LGM) (Bilgin 2011), when mountains above 2000 m were glaciated (Atalay 1996). However, in the taxa examined here, the clade on the western side of the AD and the clade on the eastern side (and the mountains of the Diagonal itself) include two and seven species, respectively. This much greater diversity (compared to phylogeographic partitioning of haplotypes within a single species) suggests that the processes that drove speciation between these clades was much earlier than the LGM. However, given the very high sequence diversity observed (and therefore probably high local rates) coupled with the absence of a reliable local calibration point for this group, we cannot pinpoint divergence times any more precisely than this. In the absence of more precise divergence times, it becomes rather speculative to link abiotic factors to clade divergences, but a few possibilities can at least be discussed.
The first of these possibilities is the refilling of the Mediterranean basin at the end Messinian, ca. 5.3 Ma (Garcia-Castellanos and Villaseñor 2011). As the sea level was much lower before the refilling took place, for about 600,000 years (Garcia-Castellanos and Villaseñor 2011), there would have been more land at lower altitudes available as habitat. This could have allowed a more widespread and possibly continuous distribution of the ancestral species between what is now southern Anatolia and the Levant. With the catastrophic refilling of the basin, the ancestor may have been split between these two endpoints and diverged as a consequence of the isolation (i.e. vicariant speciation).
Another process with an overlapping time frame is the folding, uplifting and increased volcanism in the eastern Anatolian plateau that raised the height of the plateau and of the AD in particular, thought to have taken place around the Fig. 5 Proposed model of the introgression and recombination between the S. atocioides (pink) and S. aegyptiaca (yellow) S. aegyptiaca lineages. Event A is a hybridisation episode leading to the introgression of a S. atocioides (pink) allele of EST09 into the S. aegyptiaca (yellow) population. This would have occurred at ca. 0.026 relative time units. Event B is recombination among alleles in the S. aegyptiaca (yellow) population over time since introgression to produce several combinations between the single S. atocioides (pink) allele lineage from the source population and several S. aegyptiaca (yellow) allele lineages present in the S. atocioides (pink) population. We presuppose that a recombination hotspot is present to generate approx. the same breakpoint in each case. This results in alelle combinations with monophyletic S. atocioides (pink) alleles (all derived from pink allele 1) in the recombinants, but more diverse S. aegyptiaca (yellow) alleles (derived from yellow alleles 1, 2 and 3) in the recombinants, consistent with the gene tree observations. Alternatively, two independent recombinations (rather than three) and subsequent divergence of one S. aegyptiaca (yellow) allele lineage in the recombinants could explain the gene trees mid-to late-Miocene (Şaroğlu and Yilmaz 1986). If the ancestral species was more restricted to mountain slopes, rather than able to thrive at near sea level altitudes, then the increasing height of the Anatolian Diagonal may at first have increased the available habitat area by increasing the area of mountainous regions. Then later, as the peaks became too high, the presence of glaciers on the highest peaks during glacial times (e.g. from the early Pleistocene, c. 2.5 Ma; Ansell et al. 2011) could have caused the mountain chain itself to become a barrier between the western and eastern populations, again leading to speciation.
Any of the glacial cycles up to the LGM could have led to the production of a barrier along the Anatolian Diagonal, but perhaps the initial creation of a barrier would have produced the isolation necessary to cause speciation, if the ancestral species was already in position on either side of the mountains at that time.
Changes in sea level during the Pleistocene as a result of glacial cycles could also have opened up more habitats away from the glaciated mountains more recently than the end Messinian. Thus, a low-altitude corridor may have repeatedly opened and closed from 2.5 Ma onwards, forming a connection at times and a barrier at times more recently than the initial refilling of the basin. Therefore, an ancestral species adapted to lower altitudes could have been split between the western and eastern regions discussed here at many different times.
In summary, there have been several opportunities for ancestral populations adapted to either lower or higher altitudes to have been split between regions to the west and east of the AD since the end of the Miocene. Indeed, initial divergence without physical barriers is possible, but perhaps less plausible, given the relatively strong geographical pattern. We are therefore left with many reasonable hypotheses to test, if the means to achieve more precise dating becomes available, but no clear indication with the data at hand as to which of these might be more plausible. But in any case, the evidence for ancient hybridisation that we have found, via recombination in EST09, shows that whichever barrier may have driven speciation, it was a semi-porous barrier that has allowed for at least some exchange at some time(s) after the initial divergence among the S. atocioides (pink) and S. aegyptiaca (yellow) lineages.

An example of gradual speciation?
Hybridisation among close relatives with similar morphologies can complicate the distinctiveness of affected species. Evidence for hybridisation among closely related plants has been steadily accumulating in various groups of angiosperms (Rieseberg 1997;Mallet 2005). Where speciation has been caused by abiotic barriers that are permeable, secondary contact that results in some gene flow is perhaps to be expected. Our results, with many fine-grained but conflicting clusters suggested by the DISSECT analysis, within each of the nine that are subtended by highly supported branches (Toprak et al. 2016), are suggestive of gradual speciation. The signal of an isolated ancient introgression event that we present here, well after the primary divergence of the species involved, may represent the Blast gasp^of gene flow between these gradually diverging gene pools.