Genome-wide association mapping in elite winter wheat breeding for yield improvement

Increased grain yield (GY) is the primary breeding target of wheat breeders. We performed the genome-wide association study (GWAS) on 168 elite winter wheat lines from an ongoing breeding program to identify the main determinants of grain yield. Sequencing of Diversity Array Technology fragments (DArTseq) resulted in 19,350 single-nucleotide polymorphism (SNP) and presence-absence variation (PAV) markers. We identified 15 main genomic regions located in ten wheat chromosomes (1B, 2B, 2D, 3A, 3D, 5A, 5B, 6A, 6B, and 7B) that explained from 7.9 to 20.3% of the variation in grain yield and 13.3% of the yield stability. Loci identified in the reduced genepool are important for wheat improvement using marker-assisted selection. We found marker-trait associations between three genes involved in starch biosynthesis and grain yield. Two starch synthase genes (TraesCS2B03G1238800 and TraesCS2D03G1048800) and a sucrose synthase gene (TraesCS3D03G0024300) were found in regions of QGy.rut-2B.2, QGy.rut-2D.1, and QGy.rut-3D, respectively. These loci and other significantly associated SNP markers found in this study can be used for pyramiding favorable alleles in high-yielding varieties or to improve the accuracy of prediction in genomic selection. Supplementary Information The online version contains supplementary material available at 10.1007/s13353-023-00758-8.


Introduction
Cultivating common wheat (Triticum aestivum L.) provides about 20% of the total calories used by the human population (Rasheed et al. 2018). Worldwide wheat harvest area exceeds 213 million ha, and about 28% of this area is located in Europe, including over 2.3 million ha in Poland (FAOSTAT 2022). However, there are limitations to the territorial expansion of wheat cultivation, and to meet the challenge of doubling the wheat yield by 2050 (Rasheed et al. 2018), significant yield increase per unit of area is required. To meet this challenge, increased genetic diversity deposited in landraces (Vikram et al. 2016), synthetic wheat varieties , and wild relatives (Rasheed et al. 2018) needs to be identified and exploited in modern wheat cultivars, besides agronomical practices for yield improvement. The sequencing of the 17 Gb allohexaploid wheat (AABBDD) genome of Chinese Spring paved the way for genome-wide association studies (GWAS) and genomic selection in common wheat (Lukaszewski et al. 2014;Appels et al. 2018).

3
The wheat reference sequence provided a physical framework for mapping previously developed genetic markers with known sequences , and marker sequences deposited in databases can be used to find regions with target genes (Tyrka et al. 2021b). Hybridization arrays or next-generation sequencing (NGS) are the most common ways to find single-nucleotide polymorphisms (SNPs) and presence-absence variations (PAVs). With the continuous development of new high-throughput NGS methods, the application of genotyping by sequencing (GBS) technologies (e.g., DArTseq) is considered to be the cost-efficient genotyping alternative (Jia et al. 2018) for genomics-based breeding (Poland et al. 2012). GBS gives the genetic information needed to determine economically significant marker-trait associations and develop new wheat cultivars.
Two main approaches to dissecting the genetic basis of complex quantitative traits in crop plants are genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping. Many QTLs associated with yield-related traits in bread wheat have been identified in biparental populations (Jin et al. 2020;Isham et al. 2021;Kang et al. 2021;Li et al. 2022). At present, GWAS has become more frequently used as it allows for the identification of parts of the complex, essential traits valid in a studied panel of genotypes (Neumann et al. 2011;Sukumaran et al. 2018;Liu et al. 2018;Garcia et al. 2019;Qaseem et al. 2019;Sheoran et al. 2019;Akram et al. 2021). Regions associated with grain yield and its component traits in wheat have been identified in drought and irrigated production conditions (Golabadi et al. 2011;Neumann et al. 2011;Assanga et al. 2017;Bhusal et al. 2017;Li et al. 2019;Khan et al. 2022). Haplotypes found in GWAS (Sehgal et al. 2020) linked to GY are also needed to map candidate genes (Nadolska-Orczyk et al. 2017).
Increased GY is the primary breeding purpose of wheat. The environment strongly influences GY and can be dissected into numerous traits related to phenology and kernel development. Major QTLs significantly associated with yield are currently the target for cloning, and comparative analysis of yield-related traits revealed 145 meta-QTLs and candidate genes (Yang et al. 2021). One of the leading environmental factors influencing yield is nitrogen availability. Nitrogen fertilizer, often used to increase production per unit area, can cause lodging. In wheat, lodging generally occurs after the flowering stage and can affect both the grain yield and the quality of the wheat. Lodging can also be caused by environmental factors, diseases, or pests affecting stems or roots (Keller et al. 1999). In wheat, stem characteristics such as material strength based on lignin concentration and stem thickness play a role (Berry et al. 2007;Berry and Berry 2015;Dreccer et al. 2020Dreccer et al. , 2022Piñera-Chavez et al. 2016).
Depending on the population studied, GWAS can identify different genome regions responsible for shaping a trait (Yang et al. 2021). By introducing varieties with very different yield potential into the analyzed population, regions with major effects can be identified. Under long-term selection, polymorphism in these regions may have been lost, and other regions may contribute to yield. GWAS analysis of advanced breeding lines provides an opportunity to identify loci responsible for yield in a narrow gene pool and should indicate the main selection goals that can be achieved using marker-assisted selection. The present study aimed to identify the genomic region(s) associated with grain yield (GY) and component traits, i.e., coefficients of yield stability (STA), days to heading (DTH), plant height (PH), lodging (LDG), and thousand kernel weight (TKW) in a panel of elite wheat genotypes in a range of environments through the GWAS approach.

Genotyping and annotations
DArTseq technology (Diversity Arrays Technology Pty Ltd., Bruce, Australia) was used for genotyping 171 winter wheat lines. Markers with minor allele frequencies below 0.05 and over 25% of missing data were removed. The genotyping resulted in 11,117 dominant type silicoDArTs (identified by the presence or absence of the whole target marker sequence) and 8233 SNPs. Most of the genomic and marker data for wheat was annotated on Chinese Spring IWGSC v1.0, and DArT sequences were mapped to the updated reference IWGSC v2.1 at URGI. Based on the BLAST e-score values for the 1.0E-05 threshold, markers' locations were labeled as unique, most likely, homologous, or missing. BLAST of selected DArTseq markers vs. winter (Julius, Jagger, Arina, Mattis, Mace, Norin61, Robigus, and Clair) and spring (Weebill, Lancer, Stanley, Paragon, Spelt, Cadenza, Landmark) wheat from the pangenome project (Walkowiak et al. 2020) was performed on the Galaxy platform (Afgan et al. 2018) accessible at IPK Gatersleben (https:// galaxy-web. ipk-gater sleben. de/). Additionally, markers were mapped to recently sequenced wheat cultivars (Renan_2.1, Zhang1817, Attraktion, Kariega, Fielder) at NCBI (The National Center for Biotechnology Information).

Data analysis
Data were first processed using the Statistica 13.3 software (Tibco, CA, USA). The distribution of the data was checked with the Shapiro-Wilk test. The data were analyzed within a group of experiments conducted under the same agrotechnical conditions (A1 or A2) in Genstat version 21 (VSN International). Yield observations were analyzed using a linear model incorporating random effects of genotype, genotype × experiment interaction, and blocks within experiments. This model was used to calculate the yield score (BLUP) of the genotypes in each experiment, the average yield in the series, and heritability (Cullis et al. 2006). Also, for each set of experiments, an analysis was done using an additive main effect and multiplicative interaction (AMMI) model (Gauch 1992) to determine the coefficients of genotype stability (Purchase et al. 2000) which are the weighted distances of the genotypes from zero in the 2-dimensional plot of AMMI genotype scores.
An array of 8233 SNP markers was cut down to 7422 markers with known locations so that the structure of the population could be analyzed. These markers were further grouped into 705 linkage blocks based on a shared location within a 5-Mbp window (Tyrka et al. 2021a). In the same way, 8914 out of 11,117 silicoDArT markers were mapped on 21 CS wheat chromosomes, and 817 markers representing independent blocks of coupled markers were chosen. Markers with the lowest number of missing data in the blocks were used for the population structure analysis utilizing the STRU CTU RE version 2.3.4 software (Pritchard et al. 2000). The  Lodging LDG Measured using a visual score from not logged (0) to completely logged (9) Scale 1-9 Plant height PH Measured from ground level to the base of the spike at physiological maturity cm Thousand kernel weight TKW Measured as the average of three samples of 100 kernels g admixture model was selected with 10,000 cycles and 1000 repetitions per cycle. The test was carried out over ten repetitions for ten possible subpopulations (K = 1-10). The K parameter was selected according to Evanno et al. (2005). The general (GLM) and mixed (MLM) linear models with PCA-based structure correction were used to determine the marker-trait associations using the TASSEL 5.0 (Ithaca, New York, NY, USA) (Bradbury et al. 2007). Benjamini-Hochberg (BH) method (Benjamini and Hochberg 1995) was used to adjust the P-values for allelic substitution effects for multiple tests. Associations were considered significant if the BH-corrected P-value was below 0.05, which usually meant that the original P-value was below 0.001. The Bonferroni corrected P-values for associations with silicoDArTs and SNPs were 0.0007 and 0.0006, respectively.

Results
Significant variations and effects of the environment were found for all the traits studied ( Table 2). The experimental design does not allow a direct comparison of the effects of applied nitrogen fertilization at A1 and A2 agrotechnical levels. The mean grain yields at A1 and A2 cultivation levels were 11.29 and 10.84 t·ha −1 , respectively. At the A1 level, the average plant height was 102.2 cm, and the lodging score was 7.59. Retardant sprays were applied at the A2 level, resulting in a mean plant height of 97.6 cm and lodging of 7.07. Except for DTH, standard deviations from experiments at level A1 were lower than those at level A2 (Table 2). Experiments conducted on A1 showed higher heritability of GY compared to the A2 level (0.714 and 0.568, respectively); therefore, they may provide more stable data for GWAS and genomic prediction studies. Higher heritability values on the A1 cultivation level were also found for PH, LDG, and TKW (Table 2).
DArTseq analysis yielded two types of markers, i.e., SNPs and PAVs (silicoDArT). Due to the different characteristics of these markers, they were used separately in the analysis. SNP markers were identified as polymorphisms in 69-bp long nucleotide sequences of DArTseq markers. SilicoDArT markers, on the other hand, refer to the presence or absence of an entire marker sequence in individual genotypes. PAVs may result from mutations of a genetic or epigenetic nature in the site recognized by the restriction enzymes used to generate the marker fragments. The distribution of both marker types in the wheat genome is not even, and differences in marker saturation on the chromosomes and genomes can be noticed (Fig. 2, Table S2).
The distribution of DArTseq markers on wheat chromosomes is not random, and a higher density can be observed in the distal regions. The silicoDArT markers cover the genome better than the SNP markers, which is best seen in the proximal regions of chromosomes 4B, 4D, 6A, and 6D ( Fig. 2). At the sub-genome scale, most markers were mapped to chromosomes from the B, A, and then D genomes.
To compensate for the uneven representation of particular regions of the genome, 1706 SNP and 2383 silicoDArT markers, spaced every 5 Mbp, were selected for the analysis of population structure (Fig. 3). It was found that the genotypes could be allocated to two subpopulations, while the detailed allocation of genotypes to these subpopulations based on SNP and silicoDArT markers overlapped for only half of the lines tested (Table S1).
Genotypic and phenotypic data were used to identify markers associated with grain yield and the other traits studied (Tables S3 and S4 ,Figs. 4 and 5). Yield analysis used BLUP values, yield relative to the standard (GY%), and stability results. No MTA was found for yield data at two locations (KOH and SMH) and respective BLUP values on the A2 fertilization level. In total, 95 and 422 MTAs with GY data were identified for SNPs and silicoDArTs,  (Table S5). MTAs for GY%, GY_BLUP, and site-specific yield that were mapped on common linkage blocks were used to choose the main loci responsible for GY variation in the selected panel of genotypes.
The universal loci significant for yield improvement were selected to better understand the main genetic factors influencing GY in an ongoing wheat breeding program. The main regions were selected when at least four independent MTAs In total, 15 main regions were identified using the combined MTAs obtained for SNP and DArTseq markers (Table S5). The variation in GY_BLUP explained by the selected loci varied from 20.3% for QGy.rut-3A to 7.9% for QGy.rut-6A. Loci QGy. rut-3D, QGy.rut-5B, and QGy.rut-6B had pleiotropic effects. QGy.rut-3D shaped additionally other traits such as PH (9%), TKW (9.4%), and LDG (11.7%). QGy.rut-5B was simultaneously responsible for 15.9% of the variation in DTH, and QGy. rut-6B had a pleiotropic effect on PH (10% of variation). A single MTA (QGy.rut-5A) for stability was identified (Table 3).

Discussion
Genomic regions harboring selection signatures were different by over 80% between the European and Asian germplasm, suggesting independent improvement targets from the two geographic origins (Pont et al. 2019). Therefore, the selection of genotypes for association analyses depends on the research objective. Genetically diverse or segregating populations can be used to identify major loci determining complex quantitative traits. However, not all loci determining wide variation may be relevant for ongoing breeding programs. In the genetic background uniform for selected main genes, other genes are becoming more important. GWAS on elite lines from pre-registration experiments enables the identification of regions significant for yield improvement.
We identified a set of SNP and PAV markers for 15 main regions for yield improvement in ongoing winter wheat breeding programs. We used meta-analyses (Yang et al. 2021) to find four yield-related regions overlapping with meta-QTLs (Table 5). QTL 2B-5 was reported to     Wheat yield strongly depends on the efficient accumulation of starch in grains. Starch contributes to 60-75% of the total dry weight of the wheat grain (Sawaya et al. 1984). Starch biosynthesis involves enzymes necessary to produce sucrose in the photosynthesis process. Then, sucrose is transported to amyloplasts and metabolized to hexose phosphate. Hexose phosphate is a substrate for the biosynthesis of oil, protein, and starch. During endosperm development, most of the phosphate is used to produce starch. In amyloplasts, hexose phosphate is metabolized to ADP-glucose (Shewry 2009;Thitisaksakul et al. 2012). The activities of four key enzymes involved in sucrose-tostarch conversion, sucrose synthase (SuSase), adenosine diphosphate-glucose pyrophosphorylase (AGPase), starch synthase (StSase), and starch branching enzyme (SBE), were significantly correlated with the grain-filling rate . The wheat sucrose synthase 2 gene (TaSus2-2B) affecting grain weight has also been identified (Jiang et al. 2011)  Comparative analysis of yield-related traits revealed 145 meta-QTLs and candidate genes (Yang et al. 2021). About 40 genes associated with GY and related traits have been cloned (Liu et al. 2012;Rasheed et al. 2016;Nadolska-Orczyk et al. 2017), and functional markers have been converted to competitive allele-specific PCR (KASP) (Rasheed et al. 2016). However, some of these genes have already been established in modern lines. For example, no genetic differentiation was detected around the photoperiod regulation genes Ppd-B1, Vrn-2, and Vrn-3 (Cavanagh et al. 2013). Most accessions carrying the favorable haplotype at these QTLs came from CIMMYT, with 95% of them also carrying the dwarfing allele at Rht-B1 (Garcia et al. 2019). Other genes, TaNMR-1B and TaCOL5-7B, associated with yield increase in biparental populations have been cloned (Kan et al. 2020;Zhang et al. 2022) but not introduced to Polish breeding programs, and no significant MTAs were found in the respective regions.
Nine QTLs colocalized with regions identified in metaanalysis (Table 6) by Yang et al. (2021), including six (2A-2, 2B-2, 2D-2, 4A-2, 5A-3, and 6A-1) associated with kernel number, width, and yield. In addition, region QPhen.rut-3A corresponded to the IWA94 marker (3A 727.9-741.1) of a pleiotropic locus significantly associated with GY and six other yield-related traits ). Some loci with known genes such as Rht-B1,  have been routinely employed in marker-assisted selection (Garcia et al. 2019). For this set of genes, only the QPhen. rut-5A.3 locus is located at the position of the Vrn-A1 gene (NC_057806.1, 5A:589,259,335.0.589271309), while no significant effects were found for the remaining genes, which may indicate the fixation of these alleles in modern breeding lines. For example, we found no significant effect of Rht24 localized on the 6A chromosome at position 413.7 Mbp (Würschum et al. 2017).
Achieving optimal plant height is of prime importance for the cultivars' stability, productivity, and yield potential (Griffiths et al. 2012). Improvement in wheat yield during the Green Revolution was achieved through the introduction of reduced-height (Rht) dwarfing genes. More than 50 loci and 25 height-reducing genes have been detected for wheat (Yang et al. 2021;Muhammad et al. 2021;Mokrzycka et al. 2022). Lodging may contribute to a reduction in grain yields of up to 50% (Stapper and Fischer 1990) and a loss of bread-making quality (Berry et al. 2004). The unpredictable occurrence of lodging has made it difficult for breeders to select for lodging tolerance. Ultimately, diagnostic genetic markers would help improve standability in a breeding program (Dreccer et al. 2022). By adding the semi-dwarfing genes Rht-B1b and Rht-D1b to modern wheat cultivars (Wilhelm et al. 2013;Berry and Berry 2015), the risk of lodging has been cut down. We found no effects from loci in the region of these genes. However, the TaCM (triacetin 3′,4′,5′-O-trimethyltransferase-like) gene responsible for lodging tolerance (Ma 2009) was mapped to chromosome 3B in a position consistent with QPhen.rut-3B.2. The loci and significant SNP markers from this study can be used to create high-yield varieties by pyramiding the advantageous alleles. The introduction of a few major genes/ QTL as fixed effects in GS models increases the accuracy of genomic selection for quantitative traits (Bernardo 2014) if each gene contributes to ≥ 10% of the variance (Sehgal et al. 2020). However, such significant effects of QTLs are rarely identified for complex traits such as GY in a typical GWAS study . The significant MTAs found in this study show a change in the genetic variation of the tested elite germplasm. To improve yield gains, an optimized set of markers should be used.
Author contribution All authors contributed to the study conception and design. The first draft of the manuscript was written by M. Tyrka and all authors commented on previous versions of the manuscript. Tyrka M., Krajewski P., Bednarek P.T, and Rączka K made the data analyses; Drzazga T, Matysik P, Martofel R, Woźna-Pawlak U, Jasińska D, Niewińska M, Ługowska B, Ratajczak D, Sikora T, Witkowski E, and Dorczyk A conducted field experiments and made phenotypic observations; Tyrka D conducted laboratory experiments. All authors read and approved the final manuscript.
Funding This project was supported by the grant "Genomic selection of common wheat" (DHR.hn.802.7.2022) from the Minister for Agriculture and Rural Development.
Data Availability All data underlying the findings described in the manuscript are fully available without restriction from corresponding author.

Competing interests The authors declare no competing interests.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.