The selected portion of the matK gene was successfully amplified, then amplicons were sequenced and deposited in the GenBank under accession numbers MN047218, MN047219, MN047220, MN047221, MN047222, MN062364, MN062365, MN062366, MN062367, MN062368, MN062369, MN062370, MN062371, MN062372, MN062373, MN062374, MN062375, MN062376, MN062377, and MN062378 for all studied Triticum species. (Fig. 1 and Table 1). The length of the amplified matK gene was about 454 bp in all studied samples (partial gene) with 189 monomorphic nucleotide positions and 265 polymorphic sites (58.37% polymorphism). The GC% content average was found to be around 35.3% in all tested samples.
The nucleotide sequences of all studied species were analyzed by Tamura (1992) model to estimate the rates of different transitional and transversional substitutions as shown in Table 2. Base substitution mutation is the base of single-nucleotide polymorphism (SNP) which is either involves a transition (pyrimidines/pyrimidines or purines/purines) or transversions (pyrimidines against purines or vice versa) exchange. The estimated transition/transversion bias (R) is 0.99. Substitution pattern and rates were estimated, the nucleotide frequencies are A = 32.34%, T/U = 32.34%, C = 17.66%, and G = 17.66%.
Table 2 Maximum likelihood estimate of substitution matrix Molecular phylogenetic analysis based on DNA sequence of partial matK gene
The sequence of the chloroplast matK gene was deciphered to verify the phylogenetic relationships of studied Triticum species. The evolutionary history was conducted using the maximum likelihood method depending on the Tamura 3-parameter model by two statistical data analysis bootstrapping and pairwise distance. The two types of data analysis gave the same phylogenetic tree result (Figs. 2 and 3). The phylogenetic tree divided all studied sample (20 Triticum species) into two groups A and B. Group A (green color) represented the diploid Triticum species 2n = 2x = 14 (T. monococcum L. AmAm) with common name Einkorn collected from three different countries Iraq (IG 109083), Iran (IG 113259), and Syrian (IG 44936). This group was split into two subgroups: the first subgroup contained T. monococcum L. from Iran and Syrian while the second sub-group contained T. monococcum L. from Iraq only. Group B was split into two subgroups, I and II. Subgroup I represented the hexaploid Triticum species (red color) 2n = 6x = 42 (T. aestivum (BBAuAuDD)) and subgroup II represented the tetraploid species (blue color) 2n = 4x = 28 (T. turgidium subsp. dicoccoides BBAuAu (Wild emmer), T. dicoccon subsp. dicoccon BBAuAu (emmer) and T. turgidium subsp. durum BBAuAu (macaroni wheat)). It was observed from the subgroup I (red color) that T. aestivum from Indian (TRI 28936) and Libyan (TRI 13955) were closely related to each other while T. aestivum accessions (Egyptian cultivar, sids 4 and Egyptian landraces, Qena, Nag Hamad 27) were different from each other and from T. aestivum accessions collected from Indian and Libyan, also T. aestivum accessions (Egyptian cultivar, Giza 168 and Egyptian landraces, New Valley, Dakhla 7) were different from each other and from all other T. aestivum accessions. The subgroup II (blue color) was divided into two clusters. The first cluster was split into two subclusters, the first subcluster contained T. turgidium subsp. dicoccoides (wild emmer) from Syrian with code number IG 46467 and IG 46447, this indicated that these two species were closely related to each other while the second subcluster contained T. dicoccon subsp. dicoccon (emmer) from Ursprungsland (TRI 28920) only. The second cluster contained T. turgidium subsp. durum (Desf), this cluster was divided into two sub-cluster. The first subcluster contained T. turgidium subsp. durum from Turkey (TRI 28834) and Iran (TRI 19242), these two species were closely related to each other. The second subcluster was divided into two sections, the first section contained T. turgidium subsp. durum from Italian (TRI 27360 and 27284) which were closely related to each other. The second section was divided into two subsections; the first subsection contained T. turgidium subsp. durum from Egypt (TRI 19223) and Egyptian cultivar Sohag 4 while the second subsection contained T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34 and Sohag, Almonshaah 41.
Estimates of evolutionary divergence between sequences
The number of base substitutions per site was estimated between nucleotide sequences (454 bp) of all studied Triticum species as shown in Table 3. All ambiguous positions were removed for each sequence pair. Analyses were performed using the Tamura 3-parameter model with MEGA program version 6. It was observed from Table 3 and Figs. 2 and 3 that the highest evolutionary divergence of studied species was found between T. monococcum L. from Syrian IG 44936 and T. turgidium subsp. dicoccoides from Syrian IG 46447 was 0.37, this indicated that these two species were highly different. While the least evolutionary divergence between T. turgidium subsp. durum (Turkey TRI 28834 and Iran TRI 19242) and also between T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34 and Sohag, Almonshaah 41 were 0.02, this indicated that every two species with high similarity. The evolutionary divergence between Egyptian Triticum aestivum cultivars (Giza 168 and sids 4) was 0.12 while The evolutionary divergence between Egyptian Triticum aestivum landraces (New Valley, Dakhla 7 and Qena, Nag Hamad 27) was 0.08, this indicated that the difference that was found between both Egyptian cultivars and Egyptian landraces was relatively low.
Table 3 Estimates of evolutionary divergence between sequences of 20 different Triticum species Estimates of base composition bias difference between sequences
From the analysis of all nucleotide sequences, the difference in base composition bias per site was compute recorded in Table 4 (Kumar and Gadagkar 2001). Even when the substitution patterns are homogeneous among lineages, the compositional distance will correlate with the number of differences between sequences. It was observed from Table 4 and Figs. 2 and 3 that the highest compositional distance found between T. monococcum L. (Iran, IG 113259) and T. turgidium subsp. dicoccoides (Syrian, IG 46447) was 0.83. While T. turgidium subsp. durum from Italien (TRI 127360 and TRI 127284) and T. turgidium subsp. durum Egyptian landraces (Sohag, Almonshaah 34 and Sohag, Almonshaah 41) had not a compositional distance. The compositional distances between Egyptian Triticum aestivum cultivar sids 4 and two Egyptian Triticum aestivum landraces (New Valley, Dakhla 7 and Qena, Nag Hamad 27) were 0.07 and 0.09, respectively; this indicated that the composition distance between these two landraces and cultivar sids 4 was a very low value.
Table 4 Estimates of base composition bias difference between sequences Molecular phylogenetic analysis based on amino acid sequence from partial matK gene translation
The translated amino acid sequences were used to detect the phylogenetic relationships between all studied species. The amino acid sequences of all studied species consist of 20 types of amino acid such as Alanine, cysteine, aspartic acid, glutamic acid, phenylalanine, glycine, histidine, isoleucine, lysine, leucine, methionine, asparagine, proline, glutamine, arginine, serine, threonine, valine, tryptophan, and tyrosine with a percentage average of its frequencies 1.06, 1.49, 4.77, 6.03, 8.44, 1.42, 4.11, 4.11, 4.44, 14.14, 1.26, 5.79, 6.62, 5.03, 4.74, 8.31, 0.73, 6.75, 1.16, and 6.19, respectively, as shown in Table 5.
Table 5 Types of amino acid and its percentage frequencies in each sample and the average of all 20 different Triticum species in amino acid sequence The evolutionary history was conducted by using the Maximum Likelihood method based on the Tamura 3-parameter model by two statistical data analysis bootstrapping and pairwise distance analysis. The two types of data analysis gave different phylogenetic tree result (Figs. 4 and 5). The phylogenetic tree using bootstrapping analysis gave the same result based on the nucleotide sequences except for some differences that will be mentioned in the following context: group B was divided into two subgroups, I and II. Subgroup I (red color) was split into two clusters. The first cluster was divided into two subclusters, the first subcluster contained T. aestivum (Egyptian cultivar, sids 4 and Egyptian landraces, New Valley, Dakhla 7) while the second subcluster contained T. aestivum Egyptian cultivar, Giza 168 only. The second cluster was divided into two subclusters, and the first subcluster contained T. aestivum from Indian (TRI 28936) and Libyan (TRI 13955); this indicated that these two species were closely related to each other while the second subcluster contained T. aestivum Egyptian landraces, Qena, Nag Hamad 27 only. Subgroup II (blue color) was divided into two clusters. The first cluster was split into two subclusters, and the first subcluster contained T. turgidium subsp. dicoccoides (wild emmer) from Syrian with code number IG 46467 and IG 46447; this indicated that these two species were closely related to each other while the second subcluster contained T. dicoccon subsp. dicoccon (emmer) from Ursprungsland (TRI 28920) only. The second cluster contained T. turgidium subsp. durum (Desf), and this cluster was split into two subclusters based on bootstrapping analysis; the first subcluster was split into two sections. The first section was divided into two subsections, and the first section contained T. turgidium subsp. durum from Egypt (TRI 19223) and Egyptian cultivar Sohag 4 but the second section contained T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34 and Sohag, Almonshaah 41 while the second subcluster was split into two sections. The first section was split into two subsections, the first subsection contained two T. turgidium subsp. durum species from Italian (TRI 27360 and 27284) which were closely related to each other, but the second subsection contained T. turgidium subsp. durum from Turkey (TRI 28834) and Iran (TRI 19242), these two species were closely related to each other. While the phylogenetic tree using pairwise distance analysis gave the same result obtained by bootstrapping analysis except some differences that will be mentioned in the following context: the second cluster from the subgroup II contained T. turgidium subsp. durum (Desf), this cluster was divided into two subclusters, the first subcluster contained T. turgidium subsp. durum from Turkey (TRI 28834) and Iran (TRI 19242), these two species were closely related to each other. The second subcluster consisted of two sections. The first section contained T. turgidium subsp. durum from Italian (TRI 27360 and 27284) which were closely related to each other. The second section was divided into two subsections, the first subsection contained T. turgidium subsp. durum from Egypt (TRI 19223) and Egyptian cultivar Sohag 4 but the second subsection contained T. turgidium subsp. durum Egyptian landraces from Sohag, Almonshaah 34, and Sohag, Almonshaah 41.