Introduction

Triticeae, one of the 41 tribes belonging to the Poaceae family, is economically very important. The most significant representatives of the Triticeae are crops: wheat (Triticum aestivum), rye (Secale cereale), barley (Hordeum vulgare) and also many forage species. On the other hand, some Triticeae are harmful weeds (e.g. Elymus repens). Such an important taxonomic group is very intensely and universally examined. The number of recognized Triticeae taxa depends on morphological, ecological, cytological, karyological or genetical taxonomic criteria and there has been no consensus on this issue so far. The ease of interspecific or even intergeneric hybrid formation within the Triticeae results in a high number of the described taxa. Among these taxa both diploids and polyploids (either auto- or allopolyploids) can be found. The diploids and polyploids are linked with genetic relationships that form complex reticulate evolution patterns. Moreover, in the Triticeae, genomes of unknown origin were found, distinct from other genomes in the tribe. All issues mentioned above complicate the phylogenetic studies as well as species defining. Thus the number of the Triticeae genera varies from 18 (based on morphology; Clayton and Renvoize 1986) to 38 (based on the genomic criterion; Löve 1984) depending on the adopted taxonomic criteria. Currently, the widely applied generic classification, based mainly on the genome composition (Dewey 1984; Löve 1984), does not correspond to morphological divisions and causes delimitation of monotypic genera. However, it is more precise in reflecting the phylogenetic relationships and the evolutionary history.

The chloroplast DNA (cpDNA) is a rich resource of phylogenetic information for the land plants. As the chloroplast genome is mostly maternally inherited among angiosperms, its sequence can help to identify maternal forms of the polyploid species. Thus the cpDNA sequence variation has been used to track maternal genealogy of the allopolyploid angiosperm taxa for circa 30 years. The cpDNA analysis has changed over time: starting from RFLP (restriction fragment length polymorphism) analysis of the whole chloroplast genome, through PCR–RFLP of selected loci, then sequencing of selected loci, to the recently applied whole chloroplast genome sequencing. The majority of phylogenetic and populational studies conducted on the chloroplast DNA so far has been based on the analysis of one or a few relatively short (1–2 kb) genomic fragments (e.g. Shaw et al. 2007).

A relatively new trend in exploitation of chloroplast sequences is DNA barcoding—the use of short DNA sequences (from a few dozens to a few hundreds bp) for species identification. For animals, DNA barcoding was developed on the basis of species specificity in the sequence of the mitochondrial CO1 gene (Hebert et al. 2003). Early studies on the CO1 gene helped to define the main requirements for an optimal DNA barcoding marker: length up to 500–700 bp (ease of amplification and sequencing), high interspecific sequence variation for a wide spectrum of organisms (e.g. for all land plants), low intraspecific sequence variation, ease of amplification with universal primers and large copy number in cells. These features allow species determination using standard procedures of molecular genetics with only minute tissue amounts. Unfortunately, for the land plants no DNA barcoding standard has been discovered so far which would meet all above criteria as ideally as CO1 does for animals. This is related to the fact that plant mitochondrial genes appeared to be unsuitable markers for DNA barcoding (Kress et al. 2005). Moreover, there is probably no single marker in the plant genome that exhibits all features required for DNA barcoding (Hollingsworth et al. 2011). Therefore, two chloroplast loci that meet the requirements for DNA barcoding only partially were chosen for species identification among the land plants, these are: the fast-evolving matK (about 700 bp) and the slowly evolving rbcL (about 500 bp) (CBOL Plant Working Group 2009). The full-length sequences of matK and rbcL (both of about 1.5 kb) were successfully applied to analyze the interspecific phylogenetic relationships among the land plants, including members of the Poaceae family (e.g. Hilu et al. 1999).

It was also suggested that other species-identifying markers should be used if resolution of matK and rbcL appeared to be insufficient. One of the most commonly proposed additional loci is the chloroplast trnH-psbA intergenic spacer (e.g. Kress et al. 2005). The aim of this study was to estimate the value of the phylogenetic information provided by the mentioned DNA barcoding markers (matK, rbcL) and by the candidate marker (trnH-psbA) as well as to evaluate the species-identification effectiveness of these plastid sequences. The di-, tetra- and hexaploid Triticeae taxa occurring in Eurasia have been used as a model group.

Materials and methods

Plant material

A total of 28 Triticeae accessions and 1 outgroup accession [Brachypodium distachyon (L.) Beauv.] were analyzed in this study. Eight polyoploid Triticeae species, represented by 14 specimens were analyzed: Elymus caninus L., Elymus repens (L.) Gould, Hordelymus europaeus (L.) Harz, Hordeum murinum L., Leymus arenarius (L.) Hochst., Thinopyrum intermedium (Host) Barkworth & D.R. Dewey, Thinopyrum junceiforme (Á. Löve & D. Löve) Á. Löve, Thinopyrum junceum (L.) Á. Löve. These accessions came from Central Europe as well as from other areas such as Afghanistan, China and Greece. The nucleotide sequences obtained for the alloploid species were compared to the nucleotide sequences obtained from the eight diploid Triticeae species (14 specimens): Hordeum bogdani Wilensky, Hordeum glaucum Steud., Hordeum bulbosum L., Lophopyrum elongatum (Host) Á. Löve, Psathyrostachys juncea (Fisch.) Nevski, Pseudoroegneria strigosa (M. Bieb.) Á. Löve, Taeniatherum caput-medusae (L.) Nevskii, Thinopyrum bessarabicum (Savul. & Rayss) Á. Löve. These eight species represent different potential parental genomes of the polyploid species. 17 specimens were obtained from germplasm repositories (American National Plant Germplasm System, Pullman, Washington, USA; Nordic Gene Bank, Alnarp, Sweden), 9 were collected by the authors of the study from the natural habitats, in case of 3 accessions the nucleotide sequences were obtained from GenBank (Table 1).

Table 1 The list of plant specimens and sequence accessions included in the study

DNA sequencing

Amplification of DNA fragments was conducted without DNA extraction using the Phire Plant Direct PCR Kit (Thermo Scientific) according to the dilution protocol. The PCR was carried out in a total volume of 20 µl. Both reagents’ concentrations and thermal cycling parameters were applied according to the manufacturer’s protocol. The primer sequences were taken from Yu et al. (2011) (matK, Tm = 48 °C), Erickson (www.barcoding.si.edu/pdf/informationonbarcodeloci.pdf, rbcL, Tm = 55 °C) and Shaw et al. (2005) (trnH-psbA, Tm = 56 °C). The PCR products were purified using the GeneJet PCR Purification Kit (Thermo Scientific) according to the manufacturer’s instruction. The purified PCR products were sequenced in both directions with the primers used for DNA amplification. Sequencing was performed either using custom sequencing services or in the authors’ laboratory with the use of the GenomeLab DTCS Quick Start Kit (Beckman-Coulter) and the GenomeLab GeXP Genetic Analysis System (Beckman-Coulter). The resulting DNA sequences were checked and aligned with the BioEdit software (Hall 1999). Multiple sequence alignments are available from the corresponding author on request.

Data analysis

The analyzed dataset consisted of 87 nucleotide sequences: 78 resulted from the DNA sequencing experiments performed during this study and 9 were retrieved from GenBank (Table 1). Brachypodium distachyon (L.) Beauv was used as an outgroup in the phylogenetic analysis. The BioEdit software was used to perform multiple sequence alignments. Prior to the phylogenetic analysis all indels, poly(T) tract of variable length and the inversion found within the trnH-psbA spacer were manually removed from the alignment.

With the MEGA 5 software (Tamura et al. 2011): (1) the optimal substitution model was determined for each locus (matK: Tamura 3-parameter + Gamma distribution, rbcL: Tamura 3-parameter + Gamma distribution, trnH-psbA: Tamura 3-parameter), (2) the minimum-evolution trees were constructed and (3) the reliability of branching was tested using the bootstrap method with 1,000 replications.

Results

Twenty-nine nucleotide sequences were obtained for each of the analyzed loci—matK, rbcL and trnH-psbA. The lengths of the sequences were: 741 nt for matK, 553 nt for rbcL, and from 618 (Leymus) to 630 nt (Elymus, Lophopyrum, Pseudoroegneria) for trnH-psbA. The average GC contents were: 34.1 % for matK, 43.0 % for rbcL, 36.6 % for trnH-psbA. The estimated Transition/Transversion biases (R) were: 1.07, 1.71 and 1.08 for matK, rbcL and trnH-psbA, respectively. In matK and rbcL the only mutations found were the substitution mutations, matK contained 34 (4.59 %) variable sites while rbcLa—15 (2.71 %) (Table 2).

Table 2 Substitutions in the DNA sequences of matK, rbcL and trnH-psbA among analyzed Triticeae specimens

The length differences among the trnH-psbA fragments were caused by indel mutations and mononucleotide repeat (poly(T) tract) of variable length (Fig. 1). All of the three observed deletions were species specific: 9 nt indel at position 94–102 in T. caput-medusae, 4 nt deletion at positions 96–99 in P. juncea and 1 nt deletion at position 109 in T. bessarabicum. Moreover, within trnH-psbA the inversion mutation of 6 bp (position 117–122) was discovered. The same orientation of this sequence was observed for E. caninus, H. bulbosum, H. glaucum, H. murinum, L. arenarius, L. elongatum, P. juncea, T. caput-medusae, T. bessarabicum, T. junceum and B. distachyon. The opposed (inversed) orientation was found in E. repens and T. junceiforme. Interestingly, in the remaining species—H. europaeus, H. bogdani, P. strogosa and T. intermedium both orientations were noted among the analyzed specimens. To perform the minimum-evolution analysis above mentioned indels, inversion and poly(T) tract were removed from the multiple alignment. Thus the trnH-psbA fragments of 597 nucleotides including 9 (1.51 %) variable sites were used for the construction of phylogenetic trees (Table 2).

Fig. 1
figure 1

Multiple alignment of the highly variable part of the trnH-psbA region from the analyzed Triticeae species. Marked sequence polymorphisms: indels (unbroken line), inversion (dotted line) and poly(T) track (dashed line)

For each of the three analyzed loci minimum-evolution phylogenetic trees were constructed (Fig. 2). Two well-supported clades were present at each of the three resulting trees: (1) the E b, E e and St genome clade, genetically almost uniform, containing genera Elymus, Lophopyrum, Pseudoroegneria and Thinopyrum; (2) the H, I and Xu clade, genetically differentiated, containing all of the analyzed Hordeum taxa. Another significant clade was formed by the Ns genome-containing taxa (Leymus, Psathyrostachys) that share the same sequences of matK and rbcL, but differ at the trnH-psbA locus. The specimens of H. europaeus (genome XoXr) showed similarity with the Ns genome-carrying taxa at matK and trnH-psbA. Taeniatherum caput-medusae (Ta genome) formed a separate clade that was close to the St/E genome group at matK and rbcL, but the trnH-psbA sequences of the Ta and St/E representatives were identical.

Fig. 2
figure 2

The minimum-evolution dendrograms obtained for the analyzed Triticeae species on the basis of the three chloroplast sequences: matK (a), rbcL (b) and trnH-psbA (c). Numbers at nodes represent the proportion (%) of 1,000 bootstrap replicates supporting each node. Bootstrap values of less than 50 % are not shown. For the GenBank accession numbers refer to Table 1 in “Materials and methods

Discussion

Phylogenetic relationships

The major clades previously defined within the Triticeae on the basis of DNA sequences are the following: (1) Psathyrostachys, (2) Hordeum, (3) Pseudoroegneria, (4) Aegilops-Triticum-Secale-Taeniatherum and (5) Eremopyrum-Agropyron-Australopyrum (Adderley and Sun 2014; Escobar et al. 2011; Petersen and Seberg 1997). All of the here analyzed loci have revealed the Hordeum and Pseudoroegneria clades. However, only matK is variable enough to distinguish four well bootstrap-supported clades: Psathyrostachys, Hordeum, Pseudoroegneria and Taeniatherum. Moreover, the minimum-evolution dendrogram obtained on the basis of the matK sequences (Fig. 2a) revealed a close relationship between the Pseudoroegneria and the Taeniatherum clades, as well as a clear distinction of the Psathyrostachys and the Hordeum clades, which is consistent with the previous molecular studies (Fan et al. 2013; Escobar et al. 2011; Petersen and Seberg 1997). Thus, out of the three analyzed loci, the matK reflects intergeneric phylogenetic relationships of the Triticeae in the most complete manner.

Elymus s.l. For all three analyzed loci the largest and predominantly homogeneous clade was formed by representatives of the E and the St genomes. This clade contained diploids [P. strigosa (St), L. elongatum (E e) and T. bessarabicum (E b)], tetraploids [E. caninus (StH), T. junceiforme (E b E e)] as well as hexaploids [E. repens (StStH), T. intermedium (E b E e St) and T. junceum (E b E b E e)]. The nucleotide sequences obtained for the specimens from these taxa were almost identical. This group of genome representatives corresponds to the genus Elymus sensu lato (Melderis 1980). The previous cpDNA studies on the Triticeae (e.g. Adderley and Sun 2014; Mahelka et al. 2011; Mason-Gamer 2013; Redinbaugh et al. 2000; Yan and Sun 2012) have shown that during the allopolyploid formation the chloroplast genomes were inherited from the parent carrying the St (nuclear) genome. Hence, the allopolyploid St-containing genera, delimited on the basis of genomic composition, probably share the same (or very closely related) chloroplast genomes that originate from Pseudoroegneria. In case of E. repens and E. caninus our data support this view since their plastotypes were very similar to that of Pseudoroegneria and clearly distinct from the species carrying genome H. In case of the here analyzed, Thinopyrum species such a conclusion is not justified (although it cannot be excluded) since the plastotypes of the diploid species carrying genomes St, E b and E e were hardly distinguishable.

Phylogenetic relationships between the E e and E b genomes remain unclear. The cytogenetic studies led their authors to opposite conclusions i.e. Wang (1985) treated E e and E b (marked as J E and J, respectively) as two forms of the same genome while Jauhar (1988), on the contrary, considered them to be distinct (with designations J and E, respectively). The nucleotide sequences of the diploid E e, E b and St taxa analyzed in this study are almost the same. Only one substitution within matK differentiates T. bessarabicum (E b) from the E e and St diploid taxa. Sequencing of the chloroplast rpoA gene showed the same result: Lophopyrum and Pseudoroegneria shared identical sequences, whereas T. bessarabicum slightly diverged (Petersen and Seberg 1997). Our results confirm the very close relationship among the E e, E b and St taxa as well as support the inclusion of Lophopyrum in the Pseudoroegneria genus (Petersen and Seberg 1997).

Hordelymus-Leymus-Psathyrostachys Generally the obtained data indicate a close relationship between P. juncea (Ns), L. arenarius (NsXm) and H. europaeus (XoXr), with H. europaeus being slightly distinct from the two others. Hordelymus europaeus represents the monotypic genus of unclear origin. According to Löve (1984) it is an allotetraploid species containing the H (Hordeum) and Ta (Taeniatherum Nevski.) genomes. von Bothmer et al. (1994) excluded the presence of the H genome in Hordelymus and discovered similarity of its genomes to Ta and Ns. As this similarity was uncertain Wang et al. (1995) proposed to denote temporarily the Hordelymus genomes as XoXr. The close similarity to the Ns genome was later confirmed by Petersen and Seberg (2008). Moreover, the southern and fluorescence in situ hybridization indicated that H. europaeus is an autotetraploid carrying only the Ns genome (Ellneskog-Staam et al. 2006). The chloroplast non-coding trnS-psbC sequences of Hordelymus are similar to those of T. caput-medusae subsp. caput-medusae (Ta), P. juncea (Ns) and also H. bogdani (H) (Ni et al. 2011). The matK and trnH-psbA sequences analyzed during this study revealed a close affinity between the chloroplast genomes of Hordelymus and the Ns-carrying taxa (Psathyrostachys, Leymus). Moreover, none of the examined sequences confirmed the close relationship of Hordelymus and Taeniatherum (Ta) or Hordeum (H). Therefore, our data indicate that the maternal component of H. europaeus likely represented genus Psathyrostachys and accordingly they support presence of the Ns-related genome in this species.

Psathyrostachys Nevski, a genus with about ten mainly diploid species carrying the Ns genome, is also considered as one of the progenitors of the allopolyploid Leymus genus (Adderley and Sun 2014; Anamthawat-Jónsson and Bödvarsdóttir 2001; Culumber et al. 2011; Mizianty et al. 1999). Identity of the second Leymus progenitor, responsible for the Xm genome component, remains unclear (Guo et al. 2014). The results of this study showed a close affinity between cpDNA of Psathyrostachys and Leymus, with only a single substitution mutation in the trnH-psbA locus differentiating these genera. Therefore, the presented data fully support the contribution of Psathyrostachys in the origin of Leymus as well as indicate maternal character of this contribution.

Hordeum. All of the analyzed loci proved Hordeum to be an independent group. Moreover, the analysis separated the representatives of the H (diploid H. bogdani), I (diploid or tetraploid H. bulbosum) and Xu (diploid H. glaucum, teraploid H. murinum) genomes. This division corresponds to sections Campestria Anderson, Cerealia Anderson and Trichostachys Dumortier respectively (Yen et al. 2005). Furthermore, the obtained results are consistent with the previous studies of chloroplast or nuclear DNA sequences (e.g. Blattner 2009, Naghavi et al. 2013). Within each of the analyzed loci the tetraploid H. murinum specimens from Central Europe were the same as their probable progenitor—the diploid H. glaucum. Therefore, in the present study a close relationship of the Xu-genome taxa is evident although some differences between H. murinum and H. glaucum were found within the trnL-trnF locus by Jakob and Blattner (2010).

DNA barcoding

DNA barcoding aims at identification of organisms to the level of species. None of the here analyzed loci allows identification of species or even genera carrying genomes E b, E e and St (Elymus, Lophopyrum, Pseudoroegneria, Thinopyrum). The only exception is one substitution mutation within matK that is typical for the diploid T. bessarabicum (E b). Such species-specific mutations were already reported for the chloroplast genome. Mahelka et al. (2011) separated L. elongatum and T. bessarabicum from P. strigosa on the basis of the trnL-trnF sequences. Mason-Gamer (2013) found chloroplast mutations that separated E. caninus and E. repens from L. elongatum, T. bessarabicum and P. strigosa. Ni et al. (2011) differentiated L. elongatum from T. bessarabicum and the St clade on the basis of the trnT-psbC sequence.

Similarly, the analyzed in this study specimens of P. juncea and L. arenarius showed identity within the matK and rbcL sequences. Nevertheless, some differences between genera Psathyrostachys and Leymus were found in other cpDNA loci—e.g. trnH-psbA and rps16-trnK (Culumber et al. 2011).

The analyzed markers can be used to identify representatives of the various Hordeum genomes, i.e. I (H. bulbosum), H (H. bogdani) and Xu (H. murinum, H. glaucum). None of these markers allowed distinguishing between diploid H. glaucum and tetraploid H. murinum within the Xu group, although earlier mutations differentiating these taxa were found at the trnL-trnF locus (Jakob and Blattner 2010). Both Hordelymus and Taeniatherum exhibited unique polymorphisms within the matK and rbcL loci, while with respect to the trnH-psbA sequence, Hordelymus remained independent from other taxa and Taeniatherum was indistinguishable within the E-St clade.

The trnH-psbA locus in DNA barcoding of Triticeae

The density of substitutions identified within the trnH-psbA spacer was the lowest among the three analyzed loci. The major contribution of the trnH-psbA analysis to the here-reported species identification was separation of L. arenarius from P. juncea on the basis of a single substitution mutation. It is also worthwhile to mention the species-specific deletions found in the trnH-psbA sequence of P. juncea, T. caput-medusae and T. bessarabicum. Although these polymorphisms were removed from the alignment, as routinely they are not used in phylogenetic analysis—they can serve as very useful diagnostic tools in the DNA barcoding context. Another interesting feature of the examined trnH-psbA sequence was the presence of a minute (6 bp) inversion mutation. This inversion is located in between the inverted repeats of approx. 20 bp and shows intra-specific variation in five of the analyzed species: H. europaeus, H. bogdani, P. strigosa, T. junceiforme and T. intermedium. According to Kim and Lee (2005) such small inversions are very common in the chloroplast genomes of land plants and likely result from intra-molecular recombination of the surrounding inverted sequences. These authors also reported that within Jasminum elegans both orientations of such inversion were present. Another inversion of this type was found by Whitlock et al. (2010) at the trnH-psbA spacer in Gentianaceae. Our results show plasticity of a similar polymorphism across the whole range of taxa indicating that cpDNA sequence data must be carefully inspected for the presence of small inversions to avoid species misidentification.