Evidence for extensive hybridisation and past introgression events in feather grasses using genome-wide SNP genotyping

The proper identification of feather grasses in nature is often limited due to phenotypic variability and high morphological similarity between many species. Among plausible factors influencing this issue are hybridisation and introgression recently detected in the genus. Nonetheless, to date, only a bounded set of taxa have been investigated using integrative taxonomy combining morphological and molecular data. Here, we report the first large-scale study on five feather grass species across several hybrid zones in Russia and Central Asia. In total, 302 specimens were sampled in the field and classified based on the current descriptions of these taxa. They were then genotyped with high density genome-wide markers and measured based on a set of morphological characters to delimitate species and assess levels of hybridisation and introgression. Moreover, we tested species for past introgression and estimated divergence times between them.

Results

Our findings demonstrated that 250 specimens represent five distinct species: S. baicalensis, S. capillata, S. glareosa, S. grandis and S. krylovii. The remaining 52 individuals provided evidence for extensive hybridisation between S. capillata and S. baicalensis, S. capillata and S. krylovii, S. baicalensis and S. krylovii, as well as to a lesser extent between S. grandis and S. krylovii, S. grandis and S. baicalensis. We detected past reticulation events between S. baicalensis, S. krylovii, S. grandis and inferred that diversification within species S. capillata, S. baicalensis, S. krylovii and S. grandis started ca. 130–96 kya. In addition, the assessment of genetic population structure revealed signs of contemporary gene flow between populations across species from the section Leiostipa, despite significant geographical distances between some of them. Lastly, we concluded that only 5 out of 52 hybrid taxa were properly identified solely based on morphology.

Conclusions

Our results support the hypothesis that hybridisation is an important mechanism driving evolution in Stipa. As an outcome, this phenomenon complicates identification of hybrid taxa in the field using morphological characters alone. Thus, integrative taxonomy seems to be the only reliable way to properly resolve the phylogenetic issue of Stipa. Moreover, we believe that feather grasses may be a suitable genus to study hybridisation and introgression events in nature.

View this article's peer review reports

Conservation in the face of hybridisation: genome-wide study to evaluate taxonomic delimitation and conservation status of a threatened orchid species

Article 29 January 2021

Morphological differentiation despite gene flow in an endangered grasshopper

Article Open access 16 October 2014

Are there hybrid zones in Fagus sylvatica L. sensu lato?

Article Open access 05 December 2023

Background

The proper delimitation of species plays an important role in taxonomy as well as in studies related to speciation, biogeography and ecology, leading to effective conservation and management of biodiversity. In the last two decades, traditional approaches relying mostly on morphological features have been supplemented by molecular data that boosted the discovery of new species. Although one estimate of the number of plant species is around 298,000 [1], recently it has been shown that the plant kingdom is comprised of at least 374,000 taxa [2]. Nowadays, many systematicists emphasise the need to apply multidisciplinary data, so-called integrative approaches or integrative taxonomy [3]. For instance, information from a variety of disciplines, e.g., morphology, biochemistry, cytogenetics and ‘omics studies, increases the reliability and validity in identifying taxa [4,5,6].

To date, among the molecular methods, DNA barcoding has been a widely utilised tool to identify taxa at different levels aiming not only to facilitate revisionary taxonomy, but also to broaden our understanding of molecular phylogenetics and population-level variations [7,8,9]. Among standard plant markers, chloroplast regions rbcL and matK and the nuclear internal transcribed spacer (ITS) locus have been proposed for DNA barcoding of land plants [10, 11]. Additionally, several non-coding plastid regions have been suggested as supplementary loci where further resolution is required [10].

Stipa is one of the largest genera in the grass subfamily Pooideae (Poaceae), currently comprising nearly 150 cool-season species with C₃ photosynthesis common in Eurasia and North Africa [12, 13]. Based on ITS and the plastid trnK region, the genus has been proven to be monophyletic [14, 15]. Nonetheless, the traditional barcodes are not able to validate the sectional subdivision in Stipa proposed, e.g., by Tzvelev [16, 17] or Freitag [18]. Recently, the nuclear intergenic spacer (IGS) region [19] and marker sets derived from whole chloroplast genomes of 19 species [20] were proposed for phylogenetic studies of feather grasses. Although these markers are more phylogenetically informative in comparison to the previously used barcodes, they are still unable to discriminate all taxa, causing unresolved nodes in the reconstructed trees [19, 20].

One of the plausible explanations for this unresolved branching in Stipa is that many feather grasses are of hybrid origin [16, 21,22,23]. Presently, hybridisation is considered to be widespread among at least 25% of plant taxa, mostly the youngest species [24]. This phenomenon is often accompanied by introgression via repeated backcrossing to one or both parental species that can lead to diversification and speciation [25, 26]. In grasses, hybrid speciation has been explicitly studied at the genome level, e.g., in Triticum [27] and Brachypodium [28]. Nonetheless, previously many hybrids and introgressed individuals were characterised exclusively based on morphology that limited their successful identification in nature [29, 30]. In addition, hybridisation may lead new organisms not only to intermediate traits of parental species but also to extreme, or transgressive, phenotypes [31] that complicate their proper taxonomic treatment. In feather grasses, the hypothesis of hybrid origin of some species was initially tested using multivariate morphological analyses [23, 32] and, more recently, applying molecular markers among genetically closely related species in the Stipa heptapotamica complex [33] as well as within two genetically distant species, S. krylovii and S. bungeana [34]. Furthermore, due to the usage of integrative taxonomic approaches, it was shown that some Stipa taxa, previously assigned to S. richteriana and S. grandis, appeared to be cryptic species [33, 35]. Thus, taking into account that ca. 30% of feather grasses could be of hybrid origin [13] and that cryptic species are present, integrative taxonomy seems to be the only reliable way to properly resolve the phylogenetic issue of Stipa.

Importantly, the advent of next generation sequencing technologies, primarily Illumina, and the continuously reducing sequencing cost have facilitated the implementation of genomic data in studies of non-model organisms. For instance, high-throughput techniques based on restriction enzymes, e.g., RADseq [36] and genotyping-by-sequencing [37], which have been foremost used in agricultural species [38], are currently widely applied in phylogenetics and studies related to hybridisation in many wild plant genera with little or no previous genomic information [39,40,41]. Recently, a promising result has also been demonstrated in Stipa where the usage of the DArTseq technique resulted in an increased number of markers that was several 100-fold higher than in the previous genomic studies [34].

During field studies on steppe communities, we observed high morphological variability in populations of genetically related plants. In particular, we noticed that some specimens of feather grasses representing S. capillata, S. grandis and S. krylovii seemingly share mixed morphological characteristics between these species, while taxon S. baicalensis is frequently observed within populations of the aforesaid taxa and resembles an intermediate phenotype between S. grandis and S. krylovii or S. capillata and S. krylovii. The above-mentioned species have wide distribution ranges (Fig. 1 and Supplementary Table S1). Specifically, S. capillata is the most widespread taxon within the genus, grows on the dry grasslands and is common in Siberia, Western Asia and is also present in a limited number of refugia in Europe and North Africa. Two species, S. baicalensis and S. grandis, share similar ranges in southwestern Siberia, in the Baikal region, in the south part of Zabaykalsky Krai, in Mongolia and in northeastern China; S. baicalensis is also present in the south part of the Russian Far East, whereas S. grandis occurs in Central China and Tibet. Finally, S. krylovii grows in Siberia, Mongolia, China, Northern Nepal, Southern Tajikistan, Eastern Kazakhstan and Eastern Kyrgyzstan [13, 42].

We hypothesise that the observed variability in S. baicalensis, S. capillata, S. grandis and S. krylovii is due to the presence of interspecific hybrids and that may lead to species misidentification based on the current descriptions of these taxa [16, 43,44,45]. Thus, in the current study, we aim to use an integrative taxonomy approach to (1) delimitate species and test if S. baicalensis is a hybrid between S. grandis × S. krylovii or S. capillata × S. krylovii; (2) assess levels of hybridisation and introgression (if present) between the examined taxa and populations at the molecular level; (3) estimate divergence times between the studied taxa; (4) obtain insight into the extent of hybridisation between these species at the morphological level and (5) assess whether morphological characters can be used to identify hybrids in the field.

Results

DNA-based species delimitation

The DArTseq technique was applied to obtain a total of 8660 SNP markers to infer the genetic structure of 302 Stipa specimens. Firstly, analyses of genetic clustering with an unweighted pair group method using arithmetic average (UPGMA) and fastSTRUCTURE revealed five major clades corresponding to morphospecies S. glareosa, S. capillata, S. grandis, S. krylovii and S. baicalensis (Fig. 2). According to the fastSTRUCTURE analysis, the first and the fourth clades consisted exclusively of pure specimens of S. glareosa and S. krylovii, respectively. The remaining clades beside pure specimens of S. capillata, S. grandis and S. baicalensis included hybrid individuals. In particular, the second clade comprised pure specimens of S. capillata and the admixed individuals S. capillata × S. baicalensis and S. capillata × S. krylovii. The third cluster consisted of pure specimens of S. grandis and hybrids S. grandis × S. krylovii and S. grandis × S. baicalensis. The fifth clade included pure specimens of S. baicalensis and the admixed individuals S. baicalensis × S. krylovii.

In total, fastSTRUCTURE inferred 52 individuals with an admixture of two genetic clusters including an exception of S. capillata × S. baicalensis (0454631) that had a minor proportion (0.02) of a third cluster representing S. krylovii. Among pure individuals only one specimen of S. krylovii (0454646) had an insignificant admixed proportion (0.03) of S. grandis. Noteworthy, the vast majority of admixed individuals (49 or 94%) had a proportion of membership in the range from 0.46 to 0.54 indicating F1 hybrids or later generations of hybrids that have no backcrossing to the parental species. The remaining admixed samples represented: (1) one individual (0477009) was formed by a 0.78–0.22 admixture between S. baicalensis and S. krylovii evidencing a first-generation backcross (F1× S. baicalensis), (2) one individual (000948) was shared between S. grandis (0.88) and S. krylovii (0.12) indicating a second-generation backcross (first-generation backcross × S. grandis), (3) one individual (000956) was admixed between S. grandis (0.64) and S. krylovii (0.36) that may suggest a first-generation backcross (F1× S. grandis) or a more complex backcross to S. grandis via different intermediate combinations.

A consistent result was also found with a principal coordinates analysis (PCoA). The first three axes explained 29.6, 19.9 and 19.2% of the total genetic divergence within the studied taxa, respectively. According to the PCoA, pure individuals were grouped into five markedly differentiated groups correspondingly to their taxonomic classifications (Fig. 3; an interactive version of the three-dimensional plot can be accessed via https://plot.ly/~eugenebayahmetov/40/). The remaining hybrids F1 had intermediate positions between their parental species. Two hybrid individuals S. grandis × S. krylovii (000948 and 000956) were grouped closer to S. grandis reflecting a higher proportion of membership with the first taxon established earlier with fastSTRUCTURE. Similarly, an admixed individual S. baicalensis × S. krylovii (0477009) with the proportion of 0.78 and 0.22 was closer to S. baicalensis.

Hybrid generation identification

The NewHybrids analysis revealed a more complex pattern of hybridisation than it was inferred with fastSTRUCTURE. Among 16 admixed specimens of S. baicalensis × S. krylovii, previously assigned as F1, six individuals with posterior probabilities (PP) in a range of 0.84 and 1.00 were identified being F2 (F1× F1) hybrids suggesting that F1 hybrids are able to reproduce further. One specimen, S. baicalensis × S. krylovii (0477009), was proven to be a first-generation backcross (F1× S. baicalensis) having PP of 0.81. In addition, five mixed individuals had PP between two categories (F1 and F2 hybrids) in a range of 0.22–0.78 suggesting uncertainty in the assignment (Fig. 4a and Supplementary Table S2). These mixed assignment individuals may represent a more advanced hybrid generation than can be detected by NewHybrids. Within 14 hybrids of S. capillata × S. krylovii, previously assigned to F1 hybrids, we detected six individuals of F1 (PP of 0.87–1.00) and one specimen was identified as an F2 hybrid with PP of 0.83. The remaining seven individuals had mixed assignments in a range of 0.39–0.73 for the F1 class and 0.27–0.61 for the F2 class, respectively (Fig. 4b). The analysis also demonstrated that 13 out 14 hybrids of S. capillata × S. baicalensis were F1 (PP of 0.86–1.00) and one individual remained unclassified sharing PP (0.54 and 0.46) between F1 and F2 categories (Fig. 4c). Among S. grandis × S. krylovii only one F1 hybrid was detected (PP of 0.91), two individuals were assigned to the F2 class (PP of 0.83 and 1.00) and two specimens were classified as first generation backcrosses (F1× S. grandis) having PP of 0.88 and 0.99. Although the specimen 000948 was inferred to be an F1 backcross, it is more plausible that it represents rather a second-generation backcross established by fastSTRUCTURE due to NewHybrids being unable to detect more advanced backcrosses than F1. Additionally, one individual had mixed assignments between the F1 (PP of 0.27) and F2 (PP of 0.72) classes (Fig. 4d). Finally, the only hybrid detected for S. grandis × S. baicalensis appeared to be a first generation hybrid with PP of 1.00 (Fig. 4e).

Testing for introgression

A total of 6894 SNP markers were used to test for reticulation events between the studied taxa. Due to five different admixed combinations detected by fastSTRUCTURE and NewHybrids, we tested all possible four-species combinations regardless of their phylogenetic positions (Fig. 2). The results of the f₄ statistic suggest no gene flow between S. capillata and the remaining species because of negligible deviations from the expected 50/50 ratio of BABA/ABBA patterns and the lowest Z-scores of any tests (Table 1). This finding disagrees with the presence of contemporary hybrids S. capillata × S. krylovii and S. capillata × S. baicalensis inferred with fastSTRUCTURE and NewHybrids. Nonetheless, it can be explained by the fact that all identified admixed individuals were excluded from this analysis. On the other hand, introgression events were suggested between S. grandis and S. baicalensis (combinations 5 and 11), S. grandis and S. krylovii (combinations 4, 6, 7 and 8), S. krylovii and S. baicalensis (combinations 9 and 12). Additionally, when S. grandis, S. krylovii and S. baicalensis were analysed together (combinations 4, 7 and 10) the ratio of BABA/ABBA patterns were either almost equal (combination 10) or relatively lower (combinations 4 and 7) compared to the other tests that indicated gene flow among these species. One potential explanation is that these species are involved in introgression at the same rate, which theoretically cancel out each other.

Table 1 Test for introgression between the studied species using 6894 SNPs

Full size table

Population differentiation

A total of 3483 SNP markers were used to investigate the genetic differentiation in populations of S. baicalensis, 6288 SNPs in S. capillata, 4635 SNPs in S. grandis and 6912 SNPs in S. krylovii (Supplementary Fig. S1). The pairwise Fst values demonstrated strong differentiation among four populations of S. baicalensis, while the results of STRUCTURE and PCoA revealed two and three genetic clusters, respectively (Fig. 5a and Supplementary Fig. S2), where populations 1 and 4 are merged (Fst of 0.32, Supplementary Table S3) regardless of the fact that the distance between them is more than 1000 km. Additionally, the second most likely K according to STRUCTURE was K = 3 indicating that this number of clusters is also a likely option. Relatively strong differentiation was also shown for populations of S. capillata with an exception of populations 5 and 6 with a moderate Fst value of 0.13, suggesting potential gene flow. According to the PCoA, almost all individuals were grouped together excluding population 3 and a few specimens of populations 2 and 5. Nonetheless, the first two axes of PCoA explained only 18% of the total genetic divergence within the specimens. On the other hand, STRUCTURE supported K = 4 as the best fitting model, while two and nine clusters were also among the probable options (Fig. 5b and Supplementary Fig. S2). Within S. grandis, evidence for weak differentiation was shown for geographically close populations 5 and 6 (Fst of 0.10) as well as for 7 and 8 (Fst of 0.08), while the first two axes of the PCoA explained 23.3% of the variation and revealed the close genetic relationship between geographically distant populations 1 and 2 (Fst of 0.42). In addition, based on the second axis of the PCoA, populations 3 and 4 were distant to each other as well as to the remaining populations (Fig. 5c). Further, the STRUCTURE analysis suggested that K = 2 was the most probable number of separate clusters within S. grandis (Supplementary Fig. S2), where one pure cluster represented individuals from populations 7 and 8, while 9 out of 13 specimens of population 3 and 4 constituted the second pure group. The rest of S. grandis specimens were admixed between these pure clusters. Lastly, based on the PCoA individuals of S. krylovii were clustered into the two main groups representing population 8 from Kyrgyzstan and the remaining populations from Russia (Fig. 5d). Although few specimens of population 1 were genetically more distant to the other individuals, the second axis of PCoA explained only 5.4% of the total genetic divergence. The pairwise Fst values also supported the division within S. krylovii into two groups indicating the strong differentiation of population 8 (Fst in a range of 0.37–0.46) and a moderate or near to moderate differentiation among the remaining populations (Fst in a range of 0.06–0.19). Similarly, the STRUCTURE analysis revealed two genetic clusters (Supplementary Fig. S2) that described the population structure the best: the first one represented population 8 from Kyrgyzstan, while the second cluster comprised populations from Russia.

Divergence-time estimation

The SNAPP phylogeny based on 2717 SNP markers among pure individuals revealed largely the same topology as the UPGMA dendrogram, except for the pair S. grandis and S. krylovii that were grouped together, while S. baicalensis was a sister taxon (Fig. 6 and Supplementary Fig. S3). The result suggests that the potential split between S. capillata and three species, namely, S. grandis, S. krylovii and S. baicalensis, took place approximately 1.07 Mya with the 95% Highest Posterior Density interval (HPD) of 1.51–0.71 Mya. The most recent common ancestor for S. grandis, S. krylovii and S. baicalensis was inferred to be 0.79 Mya (95% HPD: 1.12–0.53 Mya), whereas the lowest divergence time of 0.73 Mya was registered for S. grandis and S. krylovii (95% HPD: 1.02–0.48 Mya). The chronogram also indicates that diversification within S. capillata, S. krylovii and S. baicalensis started at relatively the same time, ca. 130–114 kya (95% HPD: 181–74 kya), while S. grandis was established to be the youngest species (96 kya, 95% HPD: 137–63 kya). Lastly, although the topologies within the species had large uncertainty, some nodes had comparatively high Bayesian posterior probabilities (BPP ≥ 0.80; Supplementary Fig. S3). In particular, individuals of S. capillata from Kyrgyzstan started to differentiate 97 kya (95% HPD: 137–58 kya), while the well-supported split (BPP of 0.92) between specimens from localities 5 (Russia, the Republic of Khakassia) and 19 (Russia, the Republic of Buryatia) took place 86 kya (95% HPD: 122–53 kya). The most recent common ancestor for individuals of S. krylovii from Kyrgyzstan was inferred to be 62 kya (95% HPD: 94–35 kya), whereas specimens from localities 22 (Mongolia) and 24 (Russia, Zabaykalsky Krai) had the potential split 68 kya (95% HPD: 99–40 kya). Interestingly, specimens of S. baicalensis from localities 11 and 13 (both from Russia, the West shore of Lake Baikal) and individuals from the remaining localities had nearly the same divergence times of 97 kya (95% HPD: 140–60 kya) and 93 kya (95% HPD: 132–58 kya), respectively. Additionally, the potential split between populations of S. grandis from the Republic of Khakassia (locality 5) and the Republic of Buryatia (locality 18) took place 65 kya (95% HPD: 91–39 kya), while the well-supported split (BPP of 0.95) between specimens from localities 23 and 24 (both from Russia, Zabaykalsky Krai) took place 58 kya (95% HPD: 86–34 kya).

Morpho-molecular analysis

As non-parametric Spearman correlation coefficients did not demonstrate any strong correlation (> 0.90) between the measured variables (Supplementary Fig. S4), we retained all morphological characters (Table 3) for a factor analysis of mixed data (FAMD) and analyses of notch plots. Subsequently, to investigate if the observed phenotypes are congruent with molecular data, we supplemented the result of the FAMD analysis with the genetic clusters inferred by fastSTRUCTURE for the best K = 5. As a result, the FAMD revealed five markedly differentiated groups (Fig. 7) of operational taxonomic units (OTUs) in accordance with the detected clusters using the SNP data (Figs. 2, 3 and 4). The first four dimensions explained 31.0, 18.9, 10.9 and 8.1% of the total variability, respectively. The first three axes are mainly composed by the contribution of 11 quantitative and four qualitative variables (Supplementary Table S4).

Importantly, due to having the genetic clusters assigned by fastSTRUCTURE, the two-dimensional plot revealed the slight overlapping of OTUs belonging to the pure species S. baicalensis and S. krylovii, whereas OTUs representing the admixed specimens S. baicalensis × S. krylovii were present in both clouds of the parental taxa (Fig. 7a). Furthermore, hybrids S. capillata × S. baicalensis were also present in both clouds of the pure species. On the other hand, all the admixed individuals S. capillata × S. krylovii were grouped together with only one parental species, S. capillata. Interestingly, hybrids between S. grandis and S. krylovii were mainly grouped together with OTUs of S. baicalensis with an exception of the first- and the second-generation backcrosses. The only hybrid detected between S. grandis and S. baicalensis was clustered together with the pure individuals of the former taxon. A more clear dispersal of the pure species can be seen in the three-dimensional plot, where differences between the studied species are explained by the third principal axis (Fig. 7b; an interactive version of the plot available at https://plot.ly/~eugenebayahmetov/42/). Additionally, all combinations of two axes plots for the four dimensions are present in Supplementary Fig. S5.

The result of FAMD and notch plots of the quantitative variables demonstrated that the studied species can be differentiated morphologically mainly based on the length of the lower segment of the awn (Col1L), the distance from the end of the dorsal line of hairs to the top of the lemma (DDL), the length of the anthecium (AL), the length of the middle segment of the awn (Col2L), the length of the callus (CL), the length of ligules of the internal vegetative shoots (LigIV), the length of ligules of the middle cauline leaves (LigC) and the length of hairs on the top of the lemma (LHTA). In addition, the length of hairs on the lower segment of the awn (HLCol1), the length of hairs on the middle segment of the awn (HLCol2) and the length of the callus base (CBL) can aid to distinguish S. glareosa from the remaining taxa (Supplementary Fig. S6). For instance, the notch plot of Col1L showed significant differences between means and strong evidence of differing medians within the pure species except the pair S. baicalensis and S. capillata, while the hybrid individuals had mostly intermediate positions between the parental species except specimens of S. capillata × S. baicalensis that were significantly different from the samples of the pure taxa. The differences across all quantitative variables between individuals of the pure species and the hybrids can be better evaluated in the interactive box plots presented in Supplementary File 1.

Among the qualitative variables, the main contribution to the axes of the FAMD had the abaxial surface of vegetative leaves (AbSVL), the type of hairs on the top of the anthecium (HTTA), the type of the awn geniculation (AG) and the presence of hairs below nodes (PHBN) (Supplementary Table S4). For instance, vegetative leaves with prickles were common in S. glareosa (all samples), S. capillata (61 out of 66 samples), S. capillata × S. krylovii (12 out of 14 samples), S. capillata × S. baicalensis (10 out of 14 samples), less frequent in S. grandis (2 out of 54 samples), S. krylovii (5 out of 81 samples), S. baicalensis × S. krylovii (2 out of 14 samples) and totally absent in S. baicalensis, S. grandis × S. krylovii and S. grandis × S. baicalensis (Supplementary Fig. S7).

Thus, based on the results of the molecular analyses and the FAMD combining both phenotypic and SNP data, we were able to differentiate the pure species and the hybrid individuals at the morphological level. We established that using the traditional identification keys [16, 43,44,45] 71 out of 302 specimens had been misidentified, mostly due to their hybrid nature (Supplementary Table S5). In particular, 47 samples previously identified as pure species appeared to be hybrids. The remaining 24 specimens earlier were classified either as hybrids (15 samples) or misleadingly assigned to S. baicalensis (9 samples). Interestingly, in the latter case, all individuals were previously reported from the northeastern part of Kazakhstan [46].

In general, S. baicalensis was the most problematic species for taxonomic identification comprising 54 doubtful samples. Specifically, the above-mentioned specimens from Kazakhstan genetically were proven to be pure S. capillata, 10 specimens appeared to be hybrids S. capillata × S. baicalensis, 8 were S. baicalensis × S. krylovii and 10 were hybrids between S. capillata and S. krylovii. On the other hand, specimens previously identified as hybrids S. baicalensis × S. krylovii (3 samples) and S. grandis × S. krylovii (2 samples) were genetically assigned to the pure species S. baicalensis. Oppositely, specimens morphologically identified as S. krylovii (8 samples), S. grandis (1 sample) and S. capillata (1 sample) appeared to be hybrids S. baicalensis × S. krylovii (6 samples) and S. capillata × S. baicalensis (2 samples), S. grandis × S. baicalensis and S. capillata × S. baicalensis, respectively. Additionally, one specimen S. capillata × S. grandis was proven to be S. capillata × S. baicalensis. At last, one individual S. capillata × S. baicalensis was genetically verified to be the pure species S. capillata.

The remaining doubtful samples morphologically were either misleadingly assigned as the pure species or as taxa of hybrid nature. For instance, 4 specimens of S. capillata were hybrids between S. capillata and S. krylovii, while specimens of S. grandis (2 samples) and S. krylovii (2 samples) were shown to be genetically admixed as S. grandis × S. krylovii. In opposite, 9 individuals identified as S. capillata × S. krylovii appeared to be the pure species S. capillata. Interestingly, only 5 out of 52 hybrid taxa were properly identified based on morphology including S. baicalensis × S. krylovii (3 samples) and S. grandis × S. krylovii (2 samples). The only one taxon that had not any questionable individuals was S. glareosa representing the section Smirnovia.

Discussion

The current understanding of taxonomy and species limits in Stipa is still largely based on morphological characters. Our study highlights the necessity of using molecular tools to properly identify taxa and detect processes underlying speciation. This is of particular relevance in hybrid zones where ongoing hybridisation and introgression may lead admixed individuals to phenotypes similar to one of the parental species, complicating identification of such taxa using morphological characters alone. Furthermore, integrative studies suggest that apparently intermediate phenotypes between two species are not necessarily hybrids [47]. On the other hand, as indicated in the present work, although some interspecific hybrids have intermediate characters between parental taxa, their phenotypic traits can also overlap with non-parental species leading to misidentification. Given the circumstances, here we utilised an integrative approach combining genome-wide data and morphology to delimitate species and ascertain the extent of hybridisation in feather grasses.

The study clearly illustrates that a molecular-based analysis, e.g., such as fastSTRUCTURE, combined with a factor analysis of mixed data, utilising both quantitative and qualitative variables, can largely resolve the problem with species identification in the face of ongoing hybridisation and introgression. Primarily, such an approach allows to visualise and easily trace if the observed phenotypes are congruent with molecular data. Besides that, this approach may aid in selecting a set of traits that can be useful for species identification in the field. Our findings clearly demonstrate that the studied individuals can be clustered into five species groups representing S. capillata, S. baicalensis, S. glareosa, S. grandis and S. krylovii. Thus, here we found no evidence that S. baicalensis is of hybrid origin from S. grandis × S. krylovii or S. capillata × S. krylovii but is instead a genetically distinct species. The general branching of the phylogenetic trees is in good agreement with the current taxonomic classification. In particular, a representative of the section Smirnovia, S. glareosa, was genetically distant to the remaining species from the section Leiostipa. Nevertheless, our result contradicts a previous research, where S. capillata, S. krylovii and S. baicalensis represented one clade and S. grandis was a sister taxon [19], while here, such a sister taxon was S. capillata. The current result is likely more accurate due to applying several thousand SNPs across the genome in comparison to only one nuclear locus in the above-mentioned study. Additionally, we demonstrated that the potential split between S. capillata and the remaining representatives of Leiostipa took place approximately 1.07 Mya, which is similar to our previous estimate of 1.73 Mya based on the nucleolar organising regions [48]. Here, we also reported for the first time that diversification within species S. capillata, S. baicalensis, S. krylovii and S. grandis started ca. 130–96 kya (95% HPD: 181–63 kya). These ages may correspond to the potential window of time between the Last Interglacial period, which began around 130 kya, and the Last Glacial Period (LGP), which started about 110 kya. Thus, the observed pattern is similar to dispersing events reported for different taxa across the plant kingdom [49, 50] suggesting climatic changes as a feasible factor in the current distribution of feather grasses. Of note, due to the divergence times that were inferred in SNAPP, which uses the multi-species coalescent model ignoring possible introgression, the confidence may be exaggerated. On the other hand, although introgression does cause biased errors in coalescent-based species tree inference [51,52,53], it should not affect the estimates in the present study, since all admixed individuals were excluded from the analysis.

Importantly, the results of molecular analyses were congruent and provided evidence for the existence of hybridisation between pairs S. capillata × S. baicalensis, S. capillata × S. krylovii, S. grandis × S. krylovii, S. grandis × S. baicalensis and S. baicalensis × S. krylovii. The presence of F2 or more advanced hybrid generations suggests that F1 individuals are able to reproduce further. This observation is in agreement with our previous findings on hybridisation within the genus [32, 33], where a direct approach proved that a hybrid taxon S. heptapotamica produces fertile pollen grains and is capable of backcrossing to primarily one parental species [33]. Indeed, here we detected backcrosses S. baicalensis × S. krylovii and S. grandis × S. krylovii to their former parental species. Moreover, the analysis of BABA/ABBA patterns among species revealed signs of past introgression between S. baicalensis and S. krylovii, S. baicalensis and S. grandis, S. krylovii and S. grandis. Taking into account the diversification times of these species, we can hypothesise that if such a gene flow occurred it was relatively recent in evolutionary terms and seemingly still present between S. baicalensis and S. krylovii, as well as in pair S. grandis and S. krylovii. Nevertheless, we treat our BABA/ABBA analysis with caution due to such a test originally being applied in human studies whose genome is available at the chromosome level and the number of SNPs was remarkably higher than used here. Thus, we intend to reassess the past gene flow within these taxa when a more continuous genome will be obtained. Additionally, although here we did not detect any backcrosses between S. baicalensis and S. grandis, it cannot be excluded that such a combination exists in nature, especially since both species are mostly common in Mongolia and China; however specimens from there were not presented in the study. Lastly, our study revealed only unidirectional backcrossing of hybrid taxa either to S. baicalensis or S. grandis, but not to S. krylovii. Similarly, due to the relatively small sample size and the limited number of localities used here, we cannot draw any reliable conclusions concerning possible barriers to gene flow to the latter taxon.

According to our results, the populations’ expansion seemingly started during the LGP. Nonetheless, the assessment of genetic structures revealed signs of contemporary gene flow between populations across all species, despite significant geographical distances between some of them. For instance, populations of S. baicalensis, S. capillata and S. grandis from the eastern part of Khakassia and the southwestern area of Buryatia represented either one genetic cluster or were admixed between two. Among potential explanations are shifting their distributions in response to climate change, or seasonal migration of wild animals and livestock grazing. While we currently do not possess enough data to verify the first assumption, seeds of feather grasses usually are spread naturally by wind or water. On the other hand, they also can be frequently dispersed by the wool of mammals including, e.g., sheep, goats and horses [16, 18]. Although this study was not intended to explore population differences in detail, we believe that our findings regarding gene flow merit further studies in order to better understand the intraspecific variation and relationships among populations as well as to discuss potential consequences of such events.

Our results also illustrated a complex association between species at the morphological level. There are usually few phenotypic characters differentiating hybridising species and these characters are often functionally or developmentally correlated [54]. In the present case, although the studied species had a set of distinctive characters, the current identification keys do not provide a solution on distinguishing admixed individuals in the section Leiostipa. As a result, only 5 out of 52 hybrid taxa were properly identified based on morphology. Furthermore, the results demonstrated that the most problematic taxon is S. baicalensis, which was frequently misidentified either with S. capillata or a cross S. capillata × S. krylovii, while a few individuals were misleadingly assigned to S. grandis × S. krylovii. Additionally, several hybrids between S. baicalensis and S. capillata were determined as pure S. krylovii. Therefore, we believe that the identification keys should be revised in order to properly delimitate pure species and propose a taxonomic treatment for the hybrid taxa identified in the study. Moreover, for a more comprehensive morphological assessment we suggest using scanning electron microscopy that can assist in finding unique ultrastructures among pure and admixed individuals.

Although here we highlight that morphological characters alone cannot be used to properly identify hybrids and backcrossed individuals in the field, it is a common issue in plants rather than an exception in feather grasses. For instance, in a study on three species of willows it was shown that based on phenotype only 5% of specimens were classified as introgressed individuals, which was much less than the 19% detected using SNP data [55]. Another research on tropical trees demonstrated that even limited genomic sampling, when combined with morphology and geography, can greatly improve estimates of species diversity for clades where hybridisation contributes to taxonomic difficulties [56]. Recently, an investigation on several pine species using SNP data derived from the DArTseq pipeline revealed that one species, previously considered of hybrid origin, is a genetically distinct species and provided insights into the challenges of solely using morphological traits when identifying taxa with cryptic hybridisation and variable morphology [57].

In grasses, hybridisation and introgression phenomena are still mainly studied in crop species, e.g., rice [58], wheat [59] and sugarcane [60]. To date, we have detected hybrids not only between genetically closely related species [33], but also among genetically distant Stipa taxa [13, 34]. The results present here and our previous findings helps us to shift toward thinking of the Stipa phylogeny as reticulate webs rather than a strictly bifurcating tree. Nonetheless, studying hybridisation in feather grasses is not only of particular interest to plant taxonomists. The presence of parental species, multiple generation hybrids and backcrosses in different proportions in a hybrid zone may indicate renewed sympatry providing important data for studying species boundaries and patterns of speciation [61]. From this point of view, Stipa may be a suitable genus to study these phenomena. Despite the increasing interest in feather grasses at the molecular level [19, 20, 48, 62,63,64,65], there is still a lack of substantial knowledge regarding, e.g., chromosome numbers of admixed and pure taxa in hybrid zones, fertility of pollen grains in F1 and later generation hybrids and backcrosses and genomic information related to specific loci contributing to reproductive barriers. We believe that only an integrative approach combining the aforesaid data can properly interpret evolutionary patterns and processes in feather grasses.

Conclusions

In the current study we revealed a complex taxonomic issue in feather grasses with variable morphology exhibited due to extensive hybridisation. Based on SNPs derived from genome-wide genotyping we detected five genetic groups representing separate morphospecies and showed that S. baicalensis is a genetically distinct species instead of a taxon of hybrid origin as it was previously hypothesised. We demonstrated the presence of F1 hybrids between S. capillata × S. baicalensis, S. capillata × S. krylovii, S. grandis × S. krylovii, S. grandis × S. baicalensis, S. baicalensis × S. krylovii and F2 individuals in S. capillata × S. krylovii, S. grandis × S. krylovii, S. baicalensis × S. krylovii indicating low levels of reproductive isolation in these species. We also discovered a few backcrosses S. baicalensis × S. krylovii and S. grandis × S. krylovii to their former parental species suggesting possible introgression among the taxa.

Furthermore, we detected reticulation events between S. baicalensis and S. krylovii, S. baicalensis and S. grandis, S. krylovii and S. grandis. On the other hand, we revealed signs of contemporary gene flow between populations of the species from the section Leiostipa. Another important outcome of the research is divergence date estimates inferred at the species and population levels. Here we deduce that diversification within the studied species started ca. 130–96 kya and hypothesise that climatic changes during the LGP were a driving force behind the current distribution of feather grasses. Importantly, here we also emphasise the usefulness of applying integrative approaches combining molecular and morphological data to delimitate species and detect hybridisation and introgression events in feather grasses. Finally, we conclude that Stipa may be a suitable genus to study these phenomena.

Methods

Plant material

In total, 302 fully developed Stipa samples were used for molecular and morphological studies. We gathered individuals representing the section Leiostipa (S. baicalensis, S. capillata, S. grandis and S. krylovii) from localities where these taxa grow together as well as from areas where they grow separately from each other (Fig. 1, Supplementary Table S1). Additionally, we included two populations of S. baicalensis previously reported from the northeastern part of Kazakhstan [46] and 13 specimens of S. glareosa belonging to the section Smirnovia that were found in localities 10, 12, 13 and 16. All voucher specimens used in the study are preserved at TK and KRA. All maps were visualised using ArcGIS Pro 2.7.1 (ESRI, Redlands, USA). The species distribution ranges were established based on the revision of herbarium specimens preserved at AA, ALTB, B, BM, BRNU, COLO, E, FR, FRU, G, GAT, GFW, GOET, IFP, K, KAS, KFTA, KHOR, KRA, KRAM, KUZ, L, LE, LECB, M, MO, MSB, MW, NY, P, PE, PR, PRC, TAD, TASH, TK, UPS, W, WA, WU and Z.

DNA extraction, amplification and DArT sequencing

This section was performed according to the previously reported procedures [34]. In brief, genomic DNA was isolated from dried leaf tissues using a Genomic Mini AX Plant Kit (A&A Biotechnology, Poland) and sent to Diversity Arrays Technology Pty Ltd. (Canberra, Australia) for the following genome complexity reduction using restriction enzymes and high-throughput polymorphism detection [66].

All DNA samples were processed in digestion/ligation reactions as described previously [66], but replacing a single PstI-compatible adaptor with two different adaptors corresponding to two different restriction enzyme overhangs. The PstI-compatible adapter was designed to include Illumina flowcell attachment sequence, sequencing primer sequence and “staggered”, varying length barcode region, similar to the sequence previously reported [37]. The reverse adapter contained a flowcell attachment region and MseI-compatible overhang sequence. Only “mixed fragments” (PstI-MseI) were effectively amplified by PCR using an initial denaturation step of 94 °C for 1 min, followed by 30 cycles with the following temperature profile: denaturation at 94 °C for 20 s, annealing at 58 °C for 30 s and extension at 72 °C for 45 s, with an additional final extension at 72 °C for 7 min. After PCR equimolar amounts of amplification products from each sample of the 96-well microtiter plate were bulked and applied to c-Bot (Illumina, USA) bridge PCR followed by sequencing on Hiseq2500 (Illumina, USA). The single read sequencing was performed for 77 cycles.

Sequences generated from each lane were processed using proprietary DArT analytical pipelines. In the primary pipeline, the fastq files were first processed to filter away poor quality sequences, applying more stringent selection criteria to the barcode region compared to the rest of the sequence. In that way the assignments of the sequences to specific samples carried in the “barcode split” step were reliable. Approximately 2.5 mln sequences per barcode/sample were identified and used in marker calling.

DArTseq data analysis

For the downstream analyses, we applied co-dominant single nucleotide polymorphism (SNP) markers processed in the R-package dartR v.1.5.5 [67] with the following parameters: (1) a scoring reproducibility of 100%, (2) SNP loci with read depth < 5 or > 50 were removed, (3) at least 95% loci called (the respective DNA fragment had been identified in greater than 95% of all individuals), (4) monomorphic loci were removed, (5) SNPs that shared secondaries (had more than one sequence tag represented in the dataset) were randomly filtered out to keep only one random sequence tag.