Assessment of the genetic diversity, population structure and allele distribution of major plant development genes in bread wheat cultivars using DArT and gene-specific markers

Knowledge of the degree of genetic diversity can provide fundamental information to breeders for use in various breeding programmes, for instance for the selection of lines with better adaptability. The genetic diversity analysis of 188 winter wheat genotypes demonstrated that this group of cultivars could be divided into four clusters based primarily on geographical origin. The first group contained mostly American and Asian cultivars, while cluster 2 consisted of Central European cultivars, cluster 3 of Hungarian and South European cultivars and cluster 4 mainly of cultivars from Western Europe. Cultivars used in breeding programmes in Central and South East European breeding programmes were found in all four clusters. Wheat genotypes originating from this region of Europe proved to have greater genetic variability than lines from Western and Northern Europe. Among the four clusters, there were also differencies in the frequencies of winter–spring alleles in Vrn-A1, Vrn-B1, Vrn-D1 vernalisation response genes and in the frequencies of sensitive–insensitive alleles in Ppd-B1 and Ppd-D1 photoperiod response genes, which explained the differences in heading date of the four clusters as well.


Introduction
Breeding programmes will only be successful if the cultivars have sufficient genetic diversity, since this improves the chance that promising traits will be manifested, thus ensuring the selection of lines with better adaptability (Bouffier et al. 2008;Karsai et al. 2012;Orabi et al. 2014). Theoretically, intensive targeted selection may reduce the number of rare alleles, with a consequent decline in genetic diversity (Tanksley and McCouch 1997;Fu et al. 2006;Lopes et al. 2015). This may result in the loss of numerous traits that could be used to improve agronomic and quality characters or to breed genotypes better adapted to various environmental stress factors, such as resistance to pathogens or tolerance of the extreme conditions caused by climate change (Smale 1997;Chao et al. 2008;Charmet 2011;Rauf et al. 2010;Tester and Langridge 2010;van de Wouw et al. 2010;Benson et al. 2012;Lopes et al. 2015). Knowledge of the genetic diversity existing between various breeding lines could therefore provide important information for breeders. Roussel et al. (2005) demonstrated an increase in the degree of genetic similarity between European wheat cultivars, but a difference could be observed between the geographical regions of Europe as regards the allele distribution patterns, with a lower number of allele variations in Western Europe than in Central and South Eastern Europe (Roussel et al. 2005;Balfourier et al. 2007;Hai et al. 2007;Zhang et al. 2011;Kiss et al. 2014;Novoselović et al. 2016;El-Esawi et al. 2018). This was confirmed by allele diversity analyses performed using SSR markers, in which a lower allele number was detected in Western European countries (France, the Netherlands, Great Britain, Belgium) compared with South Eastern Europe (countries previously part of Yugoslavia, Greece, Bulgaria, Romania, Hungary) and the Mediterranean regions (Italy, Spain, Portugal) (Roussel et al. 2005; 1 3 Balfourier et al. 2007). This can be explained by the diverse environmental conditions and soil factors and by the differences in breeding practices (Stachel et al. 2000;Huang et al. 2002;Roussel et al. 2005;El-Esawi et al. 2018). Nevertheless, many authors have published contradictory results. Huang et al. (2007) used 42 microsatellite markers to analyse 511 Central and Northern European cultivars widely cultivated between 1945 and 2000 and found no significant quantitative decrease in genetic diversity. Similar results were reported by White et al. (2008) in a study on the genetic diversity of British, American and Australian cultivars. Due to the increasing importance of this topic, future papers will hopefully help to clarify the contradictions currently found in the international literature.
Genetic diversity can be characterised either by indirectly estimating genetic distances or determining morphological and phenotypic traits, or by a direct comparison of differences in the DNA sequences of the genotypes using molecular markers (Astarini et al. 2004;Fufa et al. 2005;Zhang et al. 2011;Spanic et al. 2016). Indirectly acquired information is not always sufficiently reliable for genetic characterisation due to deficiencies in the pedigree data, the disregard of natural and artificial selection or of mutations, and the environment-dependent variability of phenotypic traits (Parker et al. 2002;Almanza-Pinzón et al. 2003;Reif et al. 2005). By contrast, molecular marker systems are reliable, as they are not influenced by environmental, pleiotropic or epistatic effects, and DNA extracts isolated from any type of tissue in any phenophase can be used for the analysis (Mukhtar et al. 2002;Fufa et al. 2005;Karsai et al. 2012;El-basyoni et al. 2013). Over the last decade, there has been a great increase in the number of genetic diversity studies performed using a reliable marker system (RFLP, AFLP, RAPD, SSR, SNP or DArT) on various plant species (Siedler et al. 1994;Röder et al. 2002;Khan et al. 2005;Stodart et al. 2005;Roussel et al. 2005;White et al. 2008;Benson et al. 2012;Matthies et al. 2012;Nielsen et al. 2014;Kabbaj et al. 2017;El-Esawi et al. 2018). The Diversity Arrays Technology (DArT) is a rapid, cost-effective high-throughput marker system for whole genome analysis, which does not require preliminary knowledge of the sequences. This has made it one of the most widely used marker technologies for genetic analysis (Jaccoud et al. 2001;Zhang et al. 2011;Ficco et al. 2012;El-Esawi et al. 2018). The first wheat genome association map prepared with this method was published by Crossa et al. (2007). The DArT marker system has now been applied on numerous species, including cereals such as barley (Hordeum vulgare L.), wheat (Triticum aestivum L.) and durum wheat (Triticum durum L.) (Zhang et al. 2011). This technology can be used not only to prepare high-resolution genetic maps and perform association analyses, but can also help scientists to obtain a better understanding of the extent of genetic diversity and of population genetics correlations (Semagn et al. 2006;Crossa et al. 2007;White et al. 2008;Raman et al. 2010;Zhang et al. 2011).
The distribution pattern of allele variants of the major genes responsible for the environmental adaptation of wheat (the VRN: vernalisation requirement and PPD: photoperiod sensitivity genes responsible for the vegetative-generative transition) could be especially important for the toleration of climatic extremes, because the length and timing of the vegetative-generative transition can be regulated by altering the allele combination of these genes, and this could be of decisive importance in the achievement of satisfactory yield potential (González et al. 2005;Borràs et al. 2009;Chen et al. 2009Chen et al. , 2010Kiss et al. 2014). One of the most important steps in the adaptation process is the heading date, which is determined to a significant extent by these genes (Dubcovsky et al. 1998;Worland, 1996). In the case of wheat, several gene families are involved in the genetic control of the vernalisation requirement, among which the Vrn-A1, Vrn-B1 and Vrn-D1 genes have the greatest effect (Pugsley, 1971(Pugsley, , 1972. Depending on the ratio of dominant and recessive alleles of the Vrn genes in the three genomes of hexaploid wheat, a distinction can be made between winter (recessive) and spring (dominant) cultivars, while genotypes with a facultative growth habit have various combinations of dominant and recessive alleles. The influence of the dominant alleles of these genes in spring forms means that the plants require little or no cold treatment for the transition to the generative stage (Pugsley 1971(Pugsley , 1972Kato et al. 2001;Loukoianov et al. 2005). In wheat, the most influential genes in the regulation of photoperiod sensitivity are Ppd-D1 and Ppd-B1 (Law et al. 1978;Börner et al. 1993). The dominant photoperiod-insensitive allele type results in early flowering under short-day conditions, while the heading of genotypes carrying the photoperiod-sensitive allele type is retarded or may not take place at all under such conditions (Worland 1996). In the case of the Vrn-A1, Vrn-B1, Vrn-D1 and Ppd-D1 genes, length polymorphism in the promoter, intron and exon regions forms the genetic background of the spring or photoperiod-insensitive allele types Beales et al. 2007), while the photoperiod-insensitive allele of the Ppd-B1 can be attributed to the enhanced quantity of gene products due to the presence of extra gene copies (Díaz et al. 2012). In addition, it was identified that the intercopy structure between the duplicated genes of Ppd-B1 also significantly determines the level of photoperiod sensitivity, the effect of which can be detected in heading dates under field conditions especially in association with the CNV (Kiss et al. 2014). Under field conditions, however, the phenotypic effects of the various alleles of these genes exhibit considerable variability resulting from the differing environmental effects experienced in different years, leading to contradictory results and to a dearth of information on the frequency distribution of the allele combinations of the three VRN1 and the two PPD1 genes (Snape et al. 1985;Worland 1996;Blake et al. 2009;Andeden et al. 2011;Díaz et al. 2012).
The main aim of the present work was to determine (1) the degree of genetic diversity and (2) the population structure of wheat cultivars of diverse origin, and (3) the allele distribution of each subgroup of the major genes responsible for the environmental adaptability of wheat (VRN and PPD). A high-resolution marker technology (DArT), the determination of copy number (for the PPD-B1 gene) and allele-specific markers for the major plant development genes (VRN-A1, VRN-B1, VRN-D1, PPD-B1, PPD-D1) were used in the study. It is hoped that the resulting knowledge on genetic diversity and population structure will be of use in future breeding programmes.

Materials and method
The 188 wheat genotypes included in the experiment were obtained from the cereal gene bank of the ATK Agricultural Institute and were chosen on the basis of previous flowering data (Supplemental Table 1). The samples include both old cultivars that were previously widely cultivated and newly bred genotypes that are of importance for today's wheat production. The field heading date (Z59: spike out of the flag leaf sheath) of the genotypes was observed in three consecutive years (2013)(2014)(2015) in Martonvásár on chernozem soil with average N, P 2 O 5 and K 2 O contents, based on Tottman and Makepeace (1979). The heading date was determined as the days from January 1 st , corrected with the mean vernalisation requirement and photoperiod (effective heat sum) of the plants, using the method of Bogard et al. (2015). The effective heat sum is the sum of the mean daily temperatures after the saturation of the vernalisation requirement, modified by the day length, i.e. the total heat quantity obtained by the plant up to the given phenophase (SPTV = TT × FV × FP, where TT is the total daily heat sum, FV the vernalisation factor and FP the photoperiod factor).
The fresh shoot samples (100 mg) used for the extraction of genomic DNA were digested in liquid nitrogen using buffers from the DNeasy® Plant Mini Kit (Qiagen) according to the manufacturer's instructions. A list of the gene-specific primers is given in Appendix S2, together with the relevant references, which were used to evaluate the allele compositions of each wheat genotype in the vernalisation (Vrn1 genes) and photoperiod response genes (Ppd1 genes). In the case of the PPD-B1 gene, the copy number variation (CNV) was determined with the Multiplex TaqMan® Assay at iDna Genetics Ltd. (Norwich Research Park, Norwich, UK) (Díaz et al. 2012), in addition to checking for the presence of the Chinese Spring type intercopy structure, which also influences the level of photoperid sensitivity in this gene (Kiss et al. 2014). The DArT analysis of the DNA samples was performed by the Triticarte subsidiary of Diversity Arrays Technology Pty Ltd. (CSIRO, 1 Wilf Crane Crescent, Yarralumla, ACT 2600, Australia) (http://www.triti carte .co.au). The polymorphism information content (PIC) values of the individual DArT markers were calculated with the help of the formula: PIC = 1 -Σ(P i ) 2 , where P i is the frequency of the given allele or locus (Anderson et al. 1993). A binary matrix was then created from the DArT data, where the presence or absence of the specific marker alleles was designated as 1 or 0. A difference matrix was then compiled from the binary data using Jaccard's distance coefficient [JDC = 1a/ (n-d), where a is the number of marker fragments common to the two genotypes, n is the total number of marker fragments and d is the number of marker fragments missing from both genotypes]. The comparison of individual pairs was performed using a 0-1 scale, where 0 represented complete identity and 1 complete difference. The relationship matrix was determined with the UPGMA (unweighted pair group method using the arithmetic mean) module of TASSEL 3.0 (trait analysis by association, evolution and linkage), and this was used to compile dendrograms. The population structure was determined with the help of the STRU CTU RE 2.3.4 program (Pritchard et al. 2000) on the data matrix of 184 genotypes × 249 markes. For predicting the most probable number of subgroups within the population, the probabilities (K) were tested for the presence of subgroups from 1 to 10 in 10 independent runs at each step using a burn in period of 100,000. The most probable subgroup number was identified at 4, where ΔK was the highest. Using the Q values (ranging from 0 to 1), which indicate the probability with which a genotype can be assigned to a particular subgroup (Q matrix), all the cultivars were assigned to one of the 4 subgroups. Principal component analysis (PCA) was performed with the help of the Statistica 6 software package (StatSoft Inc., Tulsa, OK, USA) on the data matrix of the mean values of the three-year heading data (Z59 developmental phase) and the Q values of the studied genotypes. In order to examine whether the geographic origin has any role in formation of the population structure, cultivars were grouped into six large geographic regions based on their origin. These were European, Asian, American, Australian and African. Rank correlation was then carried out between the group numbers of geographic regions and the Q values of population structure.

Genetic diversity and population structure of the hexaploid wheat panel
The analysis of the population structure and genetic diversity of the association panel was based on DArT markers. Among the 4606 DArT markers, 1642 proved to be polymorph for the 188 wheat cultivars, of which the chromosomal localisation of 970 is known. To examine the population structure of the tested cultivars, 249 of these markers were selected; these were independent of each other, exhibited well-balanced segregation frequency and were evenly spaced over the genome. The genotypes could be divided into four clusters (Figs. 1, 2) using the Q matrix method in the STRU CTU RE program. The Q values, which range from 0 to 1, indicate the probability with which a genotype can be assigned to a particular cluster. The higher this number, the more reliable the genotype assignment. All 970 of the DArT markers with known chromosomal localisation were used for the genetic diversity analysis. Five of the wheat cultivars did not give a measurable response, so these were omitted from further analysis.
The first cluster contained 26 American, 9 European, 14 Asian, 3 Australian and 2 African cultivars, while there were 2 American, 52 European and 3 Asian genotypes in the second cluster. The third cluster consisted of 24 European genotypes, together with 5 from Asia and 1 from America, while 42 European cultivars were assigned to the fourth cluster, with 1 from Asia and 1 from America. The fourth cluster mainly contained genotypes bred in Western (British, French) and Central Europe (Germany, Austria, Switzerland). There was a higher percentage of Hungarian lines in the second and third clusters (59% and 32%), while they made up only 10.5% of the first and 14.6% of the fourth cluster. Cultivars from Eastern and South Eastern Europe were found with greater frequency in clusters 2 and 3 (18.5% and 38.7%), which could be related to the fact that genotypes from these regions serve as crossing material in each other's breeding programmes. The degree of diversity within the four clusters was fairly similar and quite high; only in cluster 2 could distinct subgroups be detected (Fig. 1). A comparison of genetic distances revealed a medium negative significant correlation between the origin of the genotypes and the population structure (r = -0 .54; P ≤ 0.001).

Allele distribution of major developmental genes
An analysis was made of the allele distribution frequency of the major plant development genes (VRN and PPD) in genotypes in each of the four clusters (Supplemental Table 1). The spring alleles of the VRN-A1, VRN-B1 and VRN-D1 genes were present in 30%, 54% and 17% of the genotypes, respectively, in the first cluster, while these figures were 2%, 7% and 9% in cluster 2, 3%, 14% and 7% in cluster 3 and 14%, 11% and 0% in cluster 4. The photoperiod-insensitive allele types of PPD-D1 and PPD-B1 were carried by 43% and 30%, respectively, of the genotypes in cluster 1, while these ratios were 55% and 11% for cluster 2, 86% and 58% for cluster 3 and 25% and 13% for cluster 4. The distribution of the copy number of the PPD-B1 gene in the genotypes was examined within each cluster, and the distribution of the intercopy structure of this gene detected in 'Chinese Spring' was also investigated. In the first and third clusters, 30% and 58%, respectively, of the genotypes carried more than one copy of this gene, while these figures were only 9% and 14%, respectively, for clusters 2 and 4. In the third cluster, the 'Chinese Spring' intercopy structure was observed in 59% of the genotypes, while it was present in 7% of the first cluster and 5% of the second. This allele could not be detected in wheat genotypes in cluster 4. The lower ratio of the spring alleles in the three VRN1 genes together with the higher ratio of photoperiod sensitivity alleles in the two PPD1 genes could explain the later heading date of genotypes in the fourth cluster. The genotypic results were in agreement with phenotypic observations, as the mean heading dates determined for the genotypes in each cluster showed that, averaged over three years, plants in clusters 1, 2 and 3 reached this phenophase more rapidly than those in cluster 4. This is illustrated in terms of the mean values of the effective heat sum (ETT) required to reach the Z59 stage in Fig. 3.

Discussion
The results of genetic diversity studies showed that the tested cultivars could be divided into four clusters, primarily on the basis of geographical origin. The first cluster consisted mainly of American and Asian cultivars, cluster 2 of Central European, cluster 3 of Hungarian and South European and cluster 4 of West European cultivars. The cultivars used in breeding programmes in Central and South Eastern Europe were found in all four clusters, confirming the findings of other authors, who reported that the wheat genotypes in this region of Europe had greater genetic diversity than their Western and Northern European counterparts (Roussel et al. 2005;Hai et al. 2007;Zhang et al. 2011;Novoselović et al. 2016;El-Esawi et al. 2018). This could be attributed to the diverse environmental and soil conditions and to differences in breeding techniques (Roussel et al. 2005). The isolation caused by the Alps and the Carpathians also played an important role in the separation of relationships between Western and South Eastern genotypes (Roussel et al. 2005). The great genetic diversity observed in Hungarian breeding materials could partly be explained by the unusual climatic conditions in Hungary, which is located at the meeting point of three climatic zones, the Oceanic (cool summer, mild winter, small annual heat fluctuation, uniform precipitation distribution), the Continental (warm summer, cold, dry winter, large annual heat fluctuation, precipitation maxima in early summer and autumn) and the Mediterranean (hot, dry summer, mild winter, precipitation maxima in autumn and winter). Each year the weather components of these three zones are experienced with different intensity and frequency, and very often in mixed or overlapping forms, thus exerting a high level of selection pressure during breeding. This means that only lines with a combination of genes and alleles that makes them capable of adapting to such diverse conditions become state-registered cultivars.
In setting up this multi-varietal wheat population based on the heading date characteristics, our aim was to study the genetic diversities in wheats with winter growth habit. Thus it is interesting to note that relatively high numbers of spring alleles were identified in the three Vrn1 vernalisation genes. There were however remarkable differences in the ratio of spring alleles between the four clusters, which were primarily found in the American and Asian cultivars and were thus present in a much higher ratio in cluster 1 than in the other clusters. The photoperiodsensitive alleles of the PPD-B1 and PPD-D1 genes were detected in a much greater number in samples assigned to cluster 4. A close correlation was revealed between this recessive allele type and later heading date (Laurie et al. 1995;Turner et al. 2005;Beales et al. 2007;Díaz et al. 2012;Kiss et al. 2014). The genotypes in clusters 1 and 3, on the other hand, contained a higher proportion of the photoperiod-insensitive alleles of the PPD1 genes, which led to a substantial reduction in the time required to reach the Z59 phenophase. These two clusters also contained a higher ratio of genotypes carrying more than one copy of the PPD-B1 gene. The absence of the 'Chinese Spring' Fig. 3 Average effective thermal time (ETT) required by each cluster to reach the Z59 developmental phase, obtained by analysing the population structure of a winter wheat panel containing 188 genotypes (Martonvásár, 2013(Martonvásár, -2015 1 3 intercopy structure of this gene from the genotypes in cluster 4 was also correlated with later heading, as the point mutation detected in the Chinese Spring cultivar exhibits cosegregation with the early heading phenotype (Beales et al. 2007;Díaz et al. 2012;Kiss et al. 2014).
One precondition for successful breeding is the presence of a satisfactory level of genetic diversity, which promotes the chances that favourable properties will be manifested, thus ensuring the selection of lines with better adaptability. For this reason, knowledge of the extent of genetic diversity, the allele distribution of the major plant development genes influencing yield potential and their relationship with the phenotype of breeding lines could provide breeders with extremely important information.