Background

Tetraploid wheat (2n = 4x = 28; AABB or AAGG) showed a significant variety in genetic and morphological traits however their evolution under domestication has not been extensively studied or reported yet [1]. The group of tetraploid wheat is relatively divergent and includes species such as Triticum timopheevii, T. araraticum, T. dicoccoides, T. militinae, T. dicoccum, T. carthlicum, T. polonicum, T. ispahanicum, T. turgidum, T. karamyschevii, T. turanicum, T. aethiopicum, and T. durum [2]. Durum wheat is the offspring of Aegilops speltoides and Triticum urartu and became tamed from Triticum turgidum ssp. dicoccum in the Fertile Crescent approximately 6000 BC [1, 3,4,5]. North Africa and the Abyssinian area have been mentioned as the durum wheat secondary center of diversity [6]. Durum wheat plays an important role in food production and is, therefore, one of the most important crops for humans. Durum wheat landraces have a higher genetic diversity than breeding populations [7] and are assumed precious parental germplasm and are used in many wheat breeding programs. Wild relatives and landraces of Triticum turgidum are a rich gene pool for agricultural purposes and new sources for the production of modern cultivars [8, 9]. Therefore, investigation of their genetic variation has proved its worth for enhancing and improving Marker-assisted selection in breeding programs [2].

Molecular markers have had a comprehensive application in the study of the genetic and structural heterogeneity of collected or natural germplasms [10,11,12,13]. They have critical influences in evaluating variation-related indexes which will lead to facilitating the screening process in breeding programs [14]. As Single nucleotide polymorphisms (SNPs) cover the whole genome of plants, their based markers seem to be the most utilized ones in plant breeding [15]. They are appropriate for the examination of population genetic variation, marker assistant selection (MAS), QTL-based mapping, and map-based cloning which are generally used in plant breeding programs [16].

So far, various molecular markers have been used to study genetic diversity in durum wheat [17,18,19]. However, the development of high throughput sequencing methods and high-resolution SNP-based maps of wheat in recent years developed its genetic research studies vastly [20,21,22]. For instance, studying 370 durum wheat samples using an Axiom 35 K array not only separated improved varieties and cultivars but also demonstrated that the Middle East and Ethiopia had the most allelic uniformity among the investigated population [23]. There is another similar report that high genetic diversity in durum wheat landraces [24]. The results of population structure and genetic diversity of a set of durum wheat in the world indicate that breeding programs have different effects on the genomes of this plant [25]. Although it has been concluded that there is an association between the germplasm of durum wheat in some countries [26] and the level of genetic diversity of durum wheat germplasm in some countries is higher than in others [27], further research is needed. Several reports have signified that genotyping based on sequencing has been progressively accepted as a low-cost and high-throughput molecular method for covering full-genome SNPs [20, 28, 29], genotyping, SNP revelation, domestication signature, and genetic variation studies for different plant species covering tetraploid wheat landraces and cultivars [30,31,32,33]. Despite the research, the evaluation of the population and genome-wide structure of tetraploid wheat landraces still needs to be assessed using high-throughput SNP genotyping. Covering this gap and studying the genetic structure of tetraploid wheat landraces utilizing a high-density SNP array will be a forward step that will help breeding researchers in conservation and hybridization programs. So this study aimed to investigate the genetic variation and segregation of tetraploid wheat landraces from nine countries using the 55 K Affymetrix SNP Array.

Results

The genome SNP distribution of investigated tetraploid wheat

A total of 23,334 polymorphic SNPs were detected in 126 tetraploid wheat landraces with the reference genome. The number of identified SNPs was 11,613 and 11,721 in the A and B genomes, respectively. The amount of identified SNPs varied from 1339 (in chromosome 4 A) up to 2005 in chromosome 5B. The lowest SNP density was observed through chromosome 3B with 1.70 SNP/Mbp and the highest value was found through chromosome 6 A with 3.21 SNP/Mbp, however, the average observed SNP density was 2.34 SNP/Mbp (Table 1).

Table 1 A summary of single nucleotide substitutions identified in durum wheat chromosomes and genomes

Although the number of transition- and transversion-type SNPs was different among chromosomes, the transition/transversion ratio was almost similar in the chromosomes of both genomes. Among observed SNPs, transition types with 75.25% were more than transversion ones (24.75%), while the ratio of transition (Ts) to transversion (Tv) was 3.04 (17,560/5,774) over both genomes (Table 1).

Genetic diversity and the polymorphism information content (PIC)

The maximum PIC values were observed for SNPs on Chromosome 6B (0.29) and minimum on Chromosome 6 A (0.26) (Fig. 1). The gene diversity (GD) and PIC value among all chromosomes ranged from 0.1 (200 SNPs) to 0.6 (65 SNPs) with the average of 0.27 and from 0.1 (228 SNPs) to 0.4 (10,255 SNPs) with an average of 0.46, respectively (Figs. 1 and 2a and b). Approximately 61% of SNPs that covered all chromosomes had PICs greater than 0.25, which relatively implies a high polymorphism for the majority of markers (Fig. 2a and b). More than 90% of SNPs (21,127 SNP) showed a low allele frequency of greater than 0.1 (Fig. 2c). Close values of GD, PIC, and minor allele frequency (MAF) were observed in the chromosomes of wheat. The highest and lowest values of GD, PIC, and MAF were obtained in chromosomes 6B and 6 A, respectively (Fig. 1).

Fig. 1
figure 1

Distribution of gene diversity (GD), polymorphic information content (PIC), and minor allele frequency (MAF) in the different chromosomes for 23,334 SNP markers in the 126 tetraploid wheat landraces

Fig. 2
figure 2

Frequency distribution. Gene diversity (a). Polymorphism information content (b). Minor allele frequency (c)

The relationship and structure of the population

Delta K (ΔK) and log-likelihood [LnP(D)] were utilized to assess the structure of the tetraploid wheat diversity and classify subgroups (K). The evaluated log-likelihood [LnP(D)] showed a gradually increasing value corresponding to the increase of K (Fig. 3a) and the best K value was K = 2, indicating that all 126 investigated tetraploid wheat landraces could be divided into two groups with the highest possibility. Similarly, the largest ΔK was observed at K = 2, confirming two subgroups in the panel (Fig. 3b). The first group consisted of 15 samples, and the second group comprised 111 samples (Fig. 3c). Clustering genetic diversity using kinship matrix also revealed that the association mapping panel was composed of two classes, with significant genetic variation among the landraces (i.e., red to yellow in the heat map clustering output). The pair-wise relative kinship coefficients among the 126 tetraploid wheat landraces ranged from − 0.81 to 4.22. About 68% of the relative kinship values were between zero to 0.05, 26% varied between 0.05 and 0.50, and only 6% were more than 0.50. The heatmap of kinship value showed that most of the values concentrated between zero and 0.05, indicating a weak relatedness in most pairs of tetraploid wheat landraces used in this study (Fig. 3c; ​ Supplementary 1).

Fig. 3
figure 3

The average log-likelihood value (a). Delta K for differing numbers of subpopulations (k) (b). Heatmap of pair-wise kinship matrix values and structure plot of the 126 tetraploid wheat landraces determined by K = 2 using 23,334 SNP markers (c)

Cluster analysis was also performed using WPGMA to construct a dendrogram from a pairwise similarity matrix (Fig. 4). The WPGMA clustering approach also divided the panel into two classes which were also consistent with observations in structure analysis and the only exception was the genotype 45,148 originated from Turkey. The first main cluster (I) consists of 15 samples including eight samples from Turkey, three samples from Ukraine, two samples from Iran, one sample from Russia, and one sample from Afghanistan. The second main cluster (II) included 111 samples originating in a variety of countries except for Iran.

Fig. 4
figure 4

WPGMA clustering dendrogram generated using 23,334 SNP markers and 126 tetraploid wheat landraces. Colors of genotypes code reflect countries of origin

The results of PCoA were adopted with WPGMA-based clustering results which divided the 126 landraces into two groups (Fig. 5). The first and second coordinates respectively described 39.16% and 6.52% of the total diversity. PCoA1 separated the two groups well so that group I near the origin of the biplot and group II had high negative values (Fig. 5). Genetic variability among the landraces of different countries based on the WPGMA method was shown in Fig. 6. Three clusters were observed: Iran is clearly distinguished from other countries; Afghanistan and Ukraine were delineated in a branch; the remaining countries including China, Armenia, Kazakhstan, Azerbaijan, Russia, and Turkey clustered together.

Fig. 5
figure 5

Principal coordinate analysis (PCoA) of 126 tetraploid wheat landraces based on 23,334 SNP markers. Colors of genotypes code reflect countries of origin

Fig. 6
figure 6

Dendrogram generated using 23,334 SNP markers and 126 tetraploid wheat landraces collected from different countries of origin. TUR Turkey, RUS Russia, AZE Azerbaijan, KAZ Kazakhstan, UKR Ukraine, ARM Armenia, AFG Afghanistan, CHN China, IRN Iran

Genetic differentiation of populations

The AMOVA was performed based on both different origins and identified two subpopulations in structure analysis (Table 2). The AMOVA result based on the different origin revealed that 9.40% of the whole variations were detected as inter-subpopulations, whereas the remaining variation (90.60%) was classified as intra-subpopulations. However, the AMOVA based on the result of the structure revealed higher variety among the population (53.24%, p < 0.001) than intra-population revealed variation. The fixation index (Fst) of 0.094 among subpopulations from different countries implied a considerable degree of segregation among them whiles a much higher Fst (0.532) between two structure analysis base generated subpopulations implies a great differentiation between the subpopulations. Iran subpopulation showed higher genetic differentiation (Fst) with other subpopulations. After that China subpopulation had higher genetic differentiation (Table 3). Thus, the gene flow between Iran subpopulations with others was much lower than that across the entire range. The highest gene flow was observed between Russia with Turkey (≈ 14.43) and Azerbaijan (≈ 6.27) subpopulations.

Table 2 AMOVA analysis of 126 durum wheat landraces
Table 3 Gene flow (Nm, upper diagonal) and pair-wise genetic differentiation (Fst, below diagonal) among durum wheat landraces

The allelic pattern across the populations

Investigated genetic variation within a population based on country grouping demonstrated that average observed (Na) and effective (Ne) allele values were 1.755 and 1.599, respectively (Table 4). The lowest Na (1.093) and Ne (1.071) were observed in the Iranian group. The Shannon’s diversity index (I), which varied from 0.06 (Iran group) to 0.53 (Ukraine group). A comparable and close arrangement was seen for expected heterozygosity (Nei’s gene diversity, He) that ranged from 0.041 (Iran group) to 0.368 (Ukraine group). The highest local inbreeding coefficient (F) was found in Ukraine (0.928) and Afghanistan groups (0.925), while the Iran group showed the lowest value of F (-0.362). The percentage of polymorphic loci (PPL) per group varied from 9.66% (Iran group) to 99.96% (Turkey group). Genetic diversity analysis based on the result of structure analysis illustrated that structure group I has a lower value of Na, Ne, I, He, F, and PPL in comparison to structure group II.

Table 4 Genetic variation among three groups of 126 durum wheat landraces

Evaluation of linkage disequilibrium

Based on the analysis of linkage disequilibrium, it was found that LD decayed with genetic distance. The 23,334 pairs of SNPs in the tested genotypes showed an average R2 value of 0.224, suggesting no high LD (Table 5). We found that finding the average of the LD in each genome, rather than measuring the LD between two SNPs located on the same chromosome, was more useful for identifying the pattern of LD across the two genomes. Table 5 represents the average LD/chromosome and the total number of SNP pairs and the number of significant SNP pairs located on the same chromosome. At the genome level, with an average of 0.2501, the A genome had the highest LD, while the B genome had an LD of 0.1978. The LD within each genome ranged from 0.181 (2 A) to 0.423 (6 A) and 0.133 (1B) to 0.242 (5B). The majority of significant marker pairs were located at a distance of < 13,000,000 bp, based on our observations. The A and B genomes possessed the highest number of significant marker pairs (262,768) and least the number (222,240), respectively (Table 5). A diagram showing the LD decay in each genome and over the whole genome is presented in Fig. 7. As compared to the B genome, the A genome showed slower LD decay (Fig. 7). An analysis of the haplotype blocks in the three highest chromosomes was carried out. A total of 11 haplotype blocks were found on chromosome 6 A, while 7 and 8 blocks were found on chromosome 3 A and 5B, respectively (Supplementary 2).

Table 5 Linkage disequilibrium between SNP markers located on the same chromosome and genome
Fig. 7
figure 7

The rate of linkage disequilibrium (LD) decay of the genome A (a), genome B (b), and total (c) of the 126-tetraploid wheat based on the 23,334 SNP markers

Discussion

The suitability of SNP markers for the study of genetic diversity and population structure of durum wheat has been proven [24, 25]. Hence in this study, we used a new SNPs array to conduct a genome-wide SNP diversity in tetraploid wheat landraces. The higher proportion of identified SNPs in the B genome is compatible with previously reported results [34, 35]. Although, interestingly chromosome 3B had the lowest SNP density (1.7), Marcotuli et al. [36] also observed the lowest number of mapped markers on chromosome 3B. The abundance of transition-type SNPs is usually detected in true SNPs and reflects the abundance of transition of cytosine to thymine via deamination of 5-methylcytosine after methylation of cytosine [37]. The observed value of Ts/Tv ratio in this study is much higher than what has been reported previously about wheat [20, 29, 32, 38,39,40] which indicates the higher methylation rate in the genome of durum wheat.

Genetic diversity and PIC values are useful parameters to measure polymorphism among genotypes used in breeding programs. The PIC values for multi-locus markers, such as SSR markers, usually range from 0 to 1.0. Based on their PIC values, Botstein et al. [41] classified multi-allelic markers into three categories. A highly informative marker is one with a PIC value higher than 0.5, a moderately informative marker has a PIC value between 0.25 and 0.5, and a slightly informative marker has a PIC value less than 0.25. The average PIC values of our study were greater than the PIC value reported by Ren et al. [30] and Alemu et al. [24] who investigated genotypes of durum sets with an application of SNP markers. It was reported that this PIC value is a good indicator of informative markers that can be used in studying the genetic diversity of various organisms [42]. Whereas, Mazzucotelli et al. [27] and Baloch et al. [43] observed equal and higher PIC values, respectively, using the same marker. Moragues et al. [44] investigated the genetic variation of 63 durum wheat landraces from the Mediterranean countries using amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers, and reported 0.24 as PIC value obtained using AFLP and 0.70 from microsatellites. As can be seen, in addition to the marker system, the germplasm studied also has a large effect on the PIC value and it is reported that this value in the landraces is equal [27] to or less [24] than the cultivars and modern lines. The presence of landraces with high geographical distribution in the present study is probably the reason for the high PIC value compared to the same study [24] that only studied the durum landraces of a country. Moreover, to the PIC value, the GD and MAF of each marker among the diversity panel were also evaluated. Chromosomes 6 A and 2 A had the lowest of these indicators, which could be due to the impact of breeding programs and selection pressure [25]. Differences in GD and MAF values of durum wheat chromosomes have already been reported, with 2 and 7 A having the lowest values [24]. Our results suggest that these markers were able to explain the genetic diversity in tetraploid wheat based on their PIC values and good distributions of SNP markers studied. They can be used in other genetic studies to identify alleles associated with target traits, including genome-wide association studies.

Structure analysis classified the landraces into two main subgroups (K = 2). The membership coefficient of 97% (122 out of 126) of samples was higher than 0.7. The multivariate methods including WPGMA clustering, PCoA, and Bayesian model-based clustering approach realized in STRUCTURE software were successful to assign landraces (99.2%) to one of these two primary subpopulations. However, as in the studies of Marzang et al. [45] and Salsman et al. [7], in some clusters, durum wheat landraces were expected to be grouped similar to the geographical pattern. The result of structure analysis, PCoA and WPGMA clustering did not separate landraces based on their region of origin. Therefore, the grouping pattern did not clearly show the presence of a clear pattern of relationships between genetic diversity and their geographical derivation. Part of this result could be due to the historical exchange between different germplasms and has been reported in several studies [19, 24, 30, 46]. As genetic distance plays a very important role in selecting parents for breeding programs, this information is crucial for selecting the candidate parents. It may be unwise to use such parents in breeding programs due to the very low genetic diversity between two genotypes from two different countries, representing two different continents. There is a very important need to understand how the tested 126-tetraploid wheat genotypes relate to each other in terms of population structure. GBS-derived SNPs may be better associated with the studied trait in genome-wide association studies (GWAS) if this is taken into account [47].

In clustering based on countries of origin, it was observed that Iran is completely different from the others and showed lower gene flow and higher genetic differentiation from other countries. Baloch et al. [43] also revealed that the Syrian and Turkish durum wheat landraces are classified into the same group. They indicate that about a hundred years before, there was no obvious breeding program according to the local consumer requirements in those regions. Bousba et al. [48] reported no particular associations between genetic diversity and geographic derivation of durum wheat collections from various countries. Similarly, Haile et al. [49] also evaluated a population consisting of 58 accessions and an advanced improved variety of tetraploid wheat using 31 neutral SSR markers and observed low variability among the released cultivars. Therefore, the dispersal and exchange of seeds among neighboring durum wheat-growing regions could also contribute to the observed higher within-population variation. This result was in accordance with the reported results [49,50,51]. Moragues et al. [44] indicated the development of the Arabian Empire throughout the Middle Ages as a possible cause of the distribution of germplasm among various regions of the Mediterranean leading to the greater intra-population variation. Availability of multiple wheat ancestral populations may lead to a mixture of landraces’ alleles from multiple gene pools of Mediterranean tetraploid wheat accessions which this process itself has led to combined the admixture of that wheat [52]. Another possible process could be the gene flow among different varieties because of the introduction of new genotypes into fields. It is clear that there is a lower association between genetic differentiation and geographical regions. Some other factors along with geographical origin can affect genetic differentiation among durum wheat landraces. However, Ren et al. [30] illustrated that environmental factors including temperature, and water-accessibility aspects, individually or in composition along with geographical elements, described a critical portion of SNP variation frequency in wild emmer illustrated a vast range of environmental circumstances. The diversity indices values for Iranian durum wheat landraces were very low, which has already been confirmed and it is necessary to expand the genetic basis of durum wheat in Iran [45]. Also, the negative F value in the Iran group indicates more heterozygotes than expected heterozygosity and excess outbreeding. Durum wheat from Turkey and Russia showed the highest diversity. Afghanistan also had good diversity despite the low sample size. Differences in the genetic diversity of durum wheat in different countries are common [25, 27], and this underscores the need for international cooperation to improve new cultivars.

AMOVA revealed that the two subpopulations had highly significant genetic diversity. Due to the selective breeding of specific traits that wheat breeders have done in different countries, subpopulations can show high levels of genetic diversity. Additionally, each subpopulation possessed wheat genotypes from different countries. A low genetic diversity in the populations might be attributed to the spread of wheat germplasm between different regions. As a result, selecting genotypes as parents, for the purpose of improving target traits, from the same subpopulation may be more effective than selecting genotypes from different subpopulations. The incorporation of haplotypes from different founder populations may require crosses between genotypes from different subpopulations. Both winter wheat and synthetic wheat genotypes had high genetic diversity within subpopulations but low genetic diversity among subpopulations [53, 54]. The gene flow level between subpopulations was determined by calculating The haploid number of migrants (Nm). In general, the Nm (haploid) value of 1.00 or lower indicates a low level of gene flow [55]. We observed a very high level of gene flow between the subpopulations in our tested materials with Nm (haploid) of 2.300. This result supports the distribution of the genotypes from one country in the two subpopulations in the tested plant material. Based on all the allelic pattern indices (Na, Ne, I, He, F and PPL) among the three subpopulations, subpopulation II is the most diverse subpopulation as it shows the highest values of all the indices. As a result, this subpopulation is expected to have genotypes from different countries compared to the other subpopulation.

It is essential to determine the magnitude and decay of LDs as they affect the SNP markers and the resolution of association mapping necessary to conduct association studies [56]. There is a wide variation in the extent of LD in different genomes across different species. LD decay in wheat was analyzed separately for each of its two genomes. Based on the nonlinear logarithmic trend line, the LD decay was estimated when LD values declined below 0.1. The LD decayed in genome B at higher distances than in genome A. The lowest rate of LD decay was observed in Ch. 1B. As a result of this finding, the use of GWAS is required for detecting QTLs located in genome B with fewer markers than for QTLs located in genome A [57]. There is a high chance of detecting QTLs with large and small effects in the current materials as a result of the high and low LD found across the two genomes [58]. Ayana et al. [59] and Larmer et al. [60] reported the same pattern of LD decay across the two wheat genome. Each genome contained regions with high LD at high genetic distances. High LD regions adjacent to low LD regions are often referred to as LD hotspots. In comparison to genome B, LD hotspot regions were higher in genome A. This means that understanding the structure of LD and how LD hotspot regions are distributed within wheat genomes is very important. In order to determine the density of markers necessary to associate genotypes with agronomic traits, understanding the LD structure is necessary to determine the genetic regions involved in characterizing these traits [56]. LD hotspots provide useful information about the density of markers in the genome. Higher marker density becomes necessary when the recombination rate is high because the likelihood of the LD being broken by a recombination event increases when the QTL and marker are close together [61]. By looking at the LD plot including the two genomes, hotspot genomic regions were clearly found at a high genetic distance and separated the low LD regions (Supplementary 2).

Conclusions

Estimation of genetic heterogeneity plays a vital role in plant breeding programs. The current study provides a detailed research-based report of the genetic diversity of tetraploid wheat landraces gathered from various countries. The results indicated that there is a lower association between the geographical origins of tetraploid wheat landraces and their genetic differentiation. Therefore, determined genetic diversity and differentiation of durum wheat materials obtained from diverse regions could provide valuable information for expanding the necessary genetic variation of breeding materials, facilitating and more efficient application of examined wheat resources as selected parental to introduce high-yielding durum wheat genotypes via breeding programs, and associated mapping investigations.

Materials and methods

Plant materials

A 126 tetraploid wheat landraces set (Supplementary 3) including accessions from Turkey (n = 59), Russia (n = 37), Azerbaijan (n = 7), Kazakhstan (n = 5), Ukraine (n = 5), Armenia (n = 5), Afghanistan (n = 3), China (n = 3), and Iran (n = 2) were used in current study. These samples were kindly provided by the Dryland Agricultural Research Sub-Institute (DARSI), Agricultural Research, Education and Extension Organization (AREEO), Kermanshah, Iran.

DNA extraction, genotype-by-sequencing (GBS), and SNP calling

Genomic DNA of samples were extracted using modified CTAB procedure [61] from 2-weeks-old plantlets with 5 replications for each cultivar. DNA concentration was measured by Quant-iTTM PicoGreen® dsDNA Assay (Life Technologies, Inc., Grand Island, NY, United States) and normalized to 20 ng/µl for library construction. The Affymetrix 55 K genotyping Array (CapitalBio Technology Company - Beijing, China) was used for genotyping qualified DNA based on the Axiom® 2.0 Assay for 126 Samples User Manual. low-quality SNPs (score < 15) were eliminated, and SNPs with heterozygosity < 10%, minor allele frequency (MAF) > 10%, and lacking data < 10% were selected as experimental samples for further analysis. Aligning of SNP flanking sequence to the reference genome (Chinese Spring cv.) carried out according to BLASTn analysis using IWGSC ver. 1.0.

Data analysis

Genetic properties of markers

The polymorphic information content (PIC), minor allele frequency (MAF), percentage of heterozygosity, and gene diversity of all 23,334 SNP markers were calculated using PowerMarker software V 3.25 [62]. To calculate the PIC, we used the following formula [41].

$$PIC=1-{\sum }_{j=1}^{n}{P}_{ij}^{2}-{\sum }_{j=1}^{n=1}{\sum }_{k=j+1}^{n}2{P}_{ij}^{2}{P}_{ik}^{2}$$

Where Pij and Pik are the frequencies of jth and kth alleles for marker i, respectively.

Analysis of population structure

STRUCTURE version 2.3.4 was also used for analyzing the structure of the population based on Bayesian cluster analysis [63] while all parameters were set as their default values, in this situation, the analysis of structure was run 10 times per every K value (K = 1 to 10) applying 30,000 steps for MC and burn-in period and an admixture model [58]. An ad hoc statistic ΔK, based on the change rate of the data log probability of successive K values, was used to estimate the best-fit probability of every hypothetical cluster (K) [64]. Investigated samples with the probability of membership ≥ 0.50 were assigned to corresponding groups [65].

Analysis of molecular variance (AMOVA) and genetic diversity indices

Genetic variation assessment was carried out with DARwin version 6.010 software [66] based on the Jaccard index. WPGMA and the Neighbor-Joining algorithm [67] were also used to build the diversity. This algorithm produces unrooted trees by assuming mutation rates over time and space equally. To determine the confidence of genetic distance among investigated individuals, 1000 bootstraps were performed which the results are indicated as percent values at the main nodes of each branch. To divide calculated genetic differences into intra- and inter-gene pool groups, Analysis of molecular variance (AMOVA) was done using the pegas package in R software [68].

Linkage disequilibrium (LD) structure

LD between SNPs in TASSEL V.5 was estimated by using observed/expected allele frequencies. LD distribution was estimated for each subpopulation and for the whole association panel (WAP) using the full matrix option. Due to its less sensitivity to marginal allele frequencies, the pairwise LD was calculated using the squared correlation coefficient of alleles (r2). In addition, LD decay was calculated for each chromosome and sub-genome based on the theoretical expectation of r2 (see [69] for details).

Haplotype block analysis

The number of haplotype blocks in each genome was determined using Haploview 4.2 software on the chromosome with the highest significant LD percentage [70]. This was done using SNP data from the target chromosome for calculating pair-wise LD between SNPs. In order to construct these haplotype blocks, the four-gamete method was applied and the cutoff of 1% was used [71,72,73].