Introduction  

Cultivating common wheat (Triticum aestivum L.) provides about 20% of the total calories used by the human population (Rasheed et al. 2018). Worldwide wheat harvest area exceeds 213 million ha, and about 28% of this area is located in Europe, including over 2.3 million ha in Poland (FAOSTAT 2022). However, there are limitations to the territorial expansion of wheat cultivation, and to meet the challenge of doubling the wheat yield by 2050 (Rasheed et al. 2018), significant yield increase per unit of area is required. To meet this challenge, increased genetic diversity deposited in landraces (Vikram et al. 2016), synthetic wheat varieties (Li et al. 2018), and wild relatives (Rasheed et al. 2018) needs to be identified and exploited in modern wheat cultivars, besides agronomical practices for yield improvement. The sequencing of the 17 Gb allohexaploid wheat (AABBDD) genome of Chinese Spring paved the way for genome-wide association studies (GWAS) and genomic selection in common wheat (Lukaszewski et al. 2014; Appels et al. 2018).

The wheat reference sequence provided a physical framework for mapping previously developed genetic markers with known sequences (Alaux et al. 2018), and marker sequences deposited in databases can be used to find regions with target genes (Tyrka et al. 2021b). Hybridization arrays or next-generation sequencing (NGS) are the most common ways to find single-nucleotide polymorphisms (SNPs) and presence–absence variations (PAVs). With the continuous development of new high-throughput NGS methods, the application of genotyping by sequencing (GBS) technologies (e.g., DArTseq) is considered to be the cost-efficient genotyping alternative (Jia et al. 2018) for genomics-based breeding (Poland et al. 2012). GBS gives the genetic information needed to determine economically significant marker-trait associations and develop new wheat cultivars.

Two main approaches to dissecting the genetic basis of complex quantitative traits in crop plants are genome-wide association studies (GWAS) and quantitative trait loci (QTL) mapping. Many QTLs associated with yield-related traits in bread wheat have been identified in biparental populations (Jin et al. 2020; Isham et al. 2021; Kang et al. 2021; Li et al. 2022). At present, GWAS has become more frequently used as it allows for the identification of parts of the complex, essential traits valid in a studied panel of genotypes (Neumann et al. 2011; Sukumaran et al. 2018; Liu et al. 2018; Garcia et al. 2019; Qaseem et al. 2019; Sheoran et al. 2019; Akram et al. 2021). Regions associated with grain yield and its component traits in wheat have been identified in drought and irrigated production conditions (Golabadi et al. 2011; Neumann et al. 2011; Assanga et al. 2017; Bhusal et al. 2017; Li et al. 2019; Khan et al. 2022). Haplotypes found in GWAS (Sehgal et al. 2020) linked to GY are also needed to map candidate genes (Nadolska-Orczyk et al. 2017).

Increased GY is the primary breeding purpose of wheat. The environment strongly influences GY and can be dissected into numerous traits related to phenology and kernel development. Major QTLs significantly associated with yield are currently the target for cloning, and comparative analysis of yield-related traits revealed 145 meta-QTLs and candidate genes (Yang et al. 2021). One of the leading environmental factors influencing yield is nitrogen availability. Nitrogen fertilizer, often used to increase production per unit area, can cause lodging. In wheat, lodging generally occurs after the flowering stage and can affect both the grain yield and the quality of the wheat. Lodging can also be caused by environmental factors, diseases, or pests affecting stems or roots (Keller et al. 1999). In wheat, stem characteristics such as material strength based on lignin concentration and stem thickness play a role (Berry et al. 2007; Berry and Berry 2015; Dreccer et al. 2020, 2022; Piñera-Chavez et al. 2016).

Depending on the population studied, GWAS can identify different genome regions responsible for shaping a trait (Yang et al. 2021). By introducing varieties with very different yield potential into the analyzed population, regions with major effects can be identified. Under long-term selection, polymorphism in these regions may have been lost, and other regions may contribute to yield. GWAS analysis of advanced breeding lines provides an opportunity to identify loci responsible for yield in a narrow gene pool and should indicate the main selection goals that can be achieved using marker-assisted selection. The present study aimed to identify the genomic region(s) associated with grain yield (GY) and component traits, i.e., coefficients of yield stability (STA), days to heading (DTH), plant height (PH), lodging (LDG), and thousand kernel weight (TKW) in a panel of elite wheat genotypes in a range of environments through the GWAS approach.

Material and methods

Phenotypic data collection and analysis

Plant material included 168 breeding lines of common winter wheat and three cultivars evaluated in pre-registration trials in the 2019/2020 season (Table S1). The lines were planted at ten research stations located at Dębina (DED, N54°7′40″, E19°2′7″), Kobierzyce (KBP, N50°58′34″, E16°55′53″), Kończewice (KOH, N53°11′5″, E18°33′15″), Krzemlin (KRZ, N53°4′30″, E14°52′48″), Modzurów (MOB, N50°9′21″, E18°7′38″), Nagradowice (NAD, N52°19′4″, E17°9′1.7″), Polanowice (POB, N50°12′25″, E20°5′5″), Radzików (RAH, N52°12′53″, E20°38′45″), Smolice (SMH, N51°41′58″, E17°10′29″), and Strzelce (STH, N52°18′52″, E19°24′20″) dispersed across Poland (Fig. 1). Weather data indicate low rainfalls in March and April in most of the sites (Figure S1). The experiments were set up in a split-block design in three sets of 56 with three reference cultivars (Artist, Patras, and RGT Kilimanjaro) and 21 incomplete blocks per set. Each block consisted of 8 or 9 randomly assigned genotypes, accounting for three repetitions per genotype. The yield was measured for a 10-m2 plot (8 rows, 12.5 cm apart, and 10 m long). Only the inner six rows were harvested to avoid edge effects. Two agrotechnical levels were used. At the standard level (A1), the way the plants were grown and fertilized was the same as what was done for production at the respective experimental stations. At the intensive level (A2), nitrogen fertilization was increased by 40 kg/ha compared to level A1, and the plants were protected from disease and lodging. Experiments on the A1 level were conducted at five stations: DED, NAD, POB, RAH, and STH. Yield at the A2 level was measured at KBP, KOH, KRZ, MOB, and SMH stations. Grain yield was recorded along with four traits (Table 1). Grain yield was compared with the average values of three high-yielding reference cultivars (Artist, Patras, and RGT Kilimanjaro) referred to as a base of 100% (GY%). For yield, coefficients of stability (STA) were also calculated and used on GWAS to determine loci responsible for reducing environment-specific effects (see the “Data analysis” section below).

Fig. 1
figure 1

Distribution of experimental stations in Poland. DED—Dębina, KBP—Kobierzyce, KOH—Kończewice, KRZ—Krzemlin, MOB—Modzurów, NAD—Nagradowice, POB—Polanowice, RAH—Radzików, SMH—Smolice, and STH—Strzelce

Table 1 Agronomical and morpho-physiological traits analyzed in ten sites in 2020 year 

Genotyping and annotations

DArTseq technology (Diversity Arrays Technology Pty Ltd., Bruce, Australia) was used for genotyping 171 winter wheat lines. Markers with minor allele frequencies below 0.05 and over 25% of missing data were removed. The genotyping resulted in 11,117 dominant type silicoDArTs (identified by the presence or absence of the whole target marker sequence) and 8233 SNPs. Most of the genomic and marker data for wheat was annotated on Chinese Spring IWGSC v1.0, and DArT sequences were mapped to the updated reference IWGSC v2.1 at URGI. Based on the BLAST e-score values for the 1.0E-05 threshold, markers’ locations were labeled as unique, most likely, homologous, or missing. BLAST of selected DArTseq markers vs. winter (Julius, Jagger, Arina, Mattis, Mace, Norin61, Robigus, and Clair) and spring (Weebill, Lancer, Stanley, Paragon, Spelt, Cadenza, Landmark) wheat from the pangenome project (Walkowiak et al. 2020) was performed on the Galaxy platform (Afgan et al. 2018) accessible at IPK Gatersleben (https://galaxy-web.ipk-gatersleben.de/). Additionally, markers were mapped to recently sequenced wheat cultivars (Renan_2.1, Zhang1817, Attraktion, Kariega, Fielder) at NCBI (The National Center for Biotechnology Information).

Data analysis

Data were first processed using the Statistica 13.3 software (Tibco, CA, USA). The distribution of the data was checked with the Shapiro–Wilk test. The data were analyzed within a group of experiments conducted under the same agrotechnical conditions (A1 or A2) in Genstat version 21 (VSN International). Yield observations were analyzed using a linear model incorporating random effects of genotype, genotype × experiment interaction, and blocks within experiments. This model was used to calculate the yield score (BLUP) of the genotypes in each experiment, the average yield in the series, and heritability (Cullis et al. 2006). Also, for each set of experiments, an analysis was done using an additive main effect and multiplicative interaction (AMMI) model (Gauch 1992) to determine the coefficients of genotype stability (Purchase et al. 2000) which are the weighted distances of the genotypes from zero in the 2-dimensional plot of AMMI genotype scores.

An array of 8233 SNP markers was cut down to 7422 markers with known locations so that the structure of the population could be analyzed. These markers were further grouped into 705 linkage blocks based on a shared location within a 5-Mbp window (Tyrka et al. 2021a). In the same way, 8914 out of 11,117 silicoDArT markers were mapped on 21 CS wheat chromosomes, and 817 markers representing independent blocks of coupled markers were chosen. Markers with the lowest number of missing data in the blocks were used for the population structure analysis utilizing the STRUCTURE version 2.3.4 software (Pritchard et al. 2000). The admixture model was selected with 10,000 cycles and 1000 repetitions per cycle. The test was carried out over ten repetitions for ten possible subpopulations (K = 1–10). The K parameter was selected according to Evanno et al. (2005). The general (GLM) and mixed (MLM) linear models with PCA-based structure correction were used to determine the marker-trait associations using the TASSEL 5.0 (Ithaca, New York, NY, USA) (Bradbury et al. 2007). Benjamini-Hochberg (BH) method (Benjamini and Hochberg 1995) was used to adjust the P-values for allelic substitution effects for multiple tests. Associations were considered significant if the BH-corrected P-value was below 0.05, which usually meant that the original P-value was below 0.001. The Bonferroni corrected P-values for associations with silicoDArTs and SNPs were 0.0007 and 0.0006, respectively.

Results

Significant variations and effects of the environment were found for all the traits studied (Table 2). The experimental design does not allow a direct comparison of the effects of applied nitrogen fertilization at A1 and A2 agrotechnical levels. The mean grain yields at A1 and A2 cultivation levels were 11.29 and 10.84 t·ha−1, respectively. At the A1 level, the average plant height was 102.2 cm, and the lodging score was 7.59. Retardant sprays were applied at the A2 level, resulting in a mean plant height of 97.6 cm and lodging of 7.07. Except for DTH, standard deviations from experiments at level A1 were lower than those at level A2 (Table 2). Experiments conducted on A1 showed higher heritability of GY compared to the A2 level (0.714 and 0.568, respectively); therefore, they may provide more stable data for GWAS and genomic prediction studies. Higher heritability values on the A1 cultivation level were also found for PH, LDG, and TKW (Table 2).

Table 2 Means, standard deviations, heritability (H), and F-values for phenotypic characteristics measured under A1 and A2 agronomic levels

DArTseq analysis yielded two types of markers, i.e., SNPs and PAVs (silicoDArT). Due to the different characteristics of these markers, they were used separately in the analysis. SNP markers were identified as polymorphisms in 69-bp long nucleotide sequences of DArTseq markers. SilicoDArT markers, on the other hand, refer to the presence or absence of an entire marker sequence in individual genotypes. PAVs may result from mutations of a genetic or epigenetic nature in the site recognized by the restriction enzymes used to generate the marker fragments. The distribution of both marker types in the wheat genome is not even, and differences in marker saturation on the chromosomes and genomes can be noticed (Fig. 2, Table S2).

Fig. 2
figure 2

Physical distribution of selected 8233 SNP (A) and 11,117 silicoDArT markers (B) on wheat chromosomes (IWGSC RefSeq v 2.1). Seven chromosomes were numbered in A, B, and D genome. Mbp – millions of base pairs

The distribution of DArTseq markers on wheat chromosomes is not random, and a higher density can be observed in the distal regions. The silicoDArT markers cover the genome better than the SNP markers, which is best seen in the proximal regions of chromosomes 4B, 4D, 6A, and 6D (Fig. 2). At the sub-genome scale, most markers were mapped to chromosomes from the B, A, and then D genomes.

To compensate for the uneven representation of particular regions of the genome, 1706 SNP and 2383 silicoDArT markers, spaced every 5 Mbp, were selected for the analysis of population structure (Fig. 3). It was found that the genotypes could be allocated to two subpopulations, while the detailed allocation of genotypes to these subpopulations based on SNP and silicoDArT markers overlapped for only half of the lines tested (Table S1).

Fig. 3
figure 3

Number of populations identified with 1706 SNP (A) and 2383 silicoDArT (B) markers representing linkage blocks

Genotypic and phenotypic data were used to identify markers associated with grain yield and the other traits studied (Tables S3 and S4, Figs. 4 and 5). Yield analysis used BLUP values, yield relative to the standard (GY%), and stability results. No MTA was found for yield data at two locations (KOH and SMH) and respective BLUP values on the A2 fertilization level. In total, 95 and 422 MTAs with GY data were identified for SNPs and silicoDArTs, respectively (Table S5). MTAs for GY%, GY_BLUP, and site-specific yield that were mapped on common linkage blocks were used to choose the main loci responsible for GY variation in the selected panel of genotypes.

Fig. 4
figure 4

Distribution of MTA for grain yield (GY) for BLUP values, relative to standard (GY%), lines yield at selected locations at A1 and A2 level (GY_LOC_A1 and GY_LOC_A2, respectively), days to heading (DTH), thousand kernel weight (TKW), lodging (LDG), plant height (PH), and stability (GY_STA) on chromosomes covered with SNP markers

Fig. 5
figure 5

Distribution of MTA for grain yield (GY) for BLUP values, relative to standard (GY%), lines yield at selected locations at A1 and A2 level (GY_LOC_A1 and GY_LOC_A2, respectively), days to heading (DTH), thousand kernel weight (TKW), lodging (LDG), and plant height (PH) on chromosomes covered by with silicoDArT markers

The universal loci significant for yield improvement were selected to better understand the main genetic factors influencing GY in an ongoing wheat breeding program. The main regions were selected when at least four independent MTAs for grain yield or stability coincided in a single linkage block. In total, 15 main regions were identified using the combined MTAs obtained for SNP and DArTseq markers (Table S5). The variation in GY_BLUP explained by the selected loci varied from 20.3% for QGy.rut-3A to 7.9% for QGy.rut-6A. Loci QGy.rut-3D, QGy.rut-5B, and QGy.rut-6B had pleiotropic effects. QGy.rut-3D shaped additionally other traits such as PH (9%), TKW (9.4%), and LDG (11.7%). QGy.rut-5B was simultaneously responsible for 15.9% of the variation in DTH, and QGy.rut-6B had a pleiotropic effect on PH (10% of variation). A single MTA (QGy.rut-5A) for stability was identified (Table 3).

Table 3 Main loci from linkage blocks responsible for variation of grain yield and stability. Number of significant MTAs in brackets

MTAs for 183 SNP and 198 PAV loci were identified for heading time, lodging resistance, plant height, or thousand kernel weight (Table S6). These data were used to select 23 central regions corresponding with variation, mainly in PH, DTH, TKW, and LGD, with 18, 8, 6, and 2 linkage blocks, respectively (Table 4). Selected loci like QPhen.rut-3D and QPhen.rut-2A accumulated 76 and 44 MTAs and explained 15.8% and 10.3% of the variation in PH, respectively. In addition, two main loci (QPhen.rut-5B.1 and QPhen.rut-6A.1) were found for variation in lodging.

Table 4 Main loci from linkage blocks responsible for variation in days to heading (DTH), plant height (PH), thousand kernel weight (TKW), and lodging (LDG). Number of significant MTA in brackets

Discussion

Genomic regions harboring selection signatures were different by over 80% between the European and Asian germplasm, suggesting independent improvement targets from the two geographic origins (Pont et al. 2019). Therefore, the selection of genotypes for association analyses depends on the research objective. Genetically diverse or segregating populations can be used to identify major loci determining complex quantitative traits. However, not all loci determining wide variation may be relevant for ongoing breeding programs. In the genetic background uniform for selected main genes, other genes are becoming more important. GWAS on elite lines from pre-registration experiments enables the identification of regions significant for yield improvement.

We identified a set of SNP and PAV markers for 15 main regions for yield improvement in ongoing winter wheat breeding programs. We used meta-analyses (Yang et al. 2021) to find four yield-related regions overlapping with meta-QTLs (Table 5). QTL 2B-5 was reported to affect the number of grains. Regions 3A-4 and 6A-8, with the genes MOC2 and OSGA20ox1, determine variation in seed number, weight, and yield. Another QTL 7B-8 with the Brd2 gene conditions seed number and weight (Yang et al. 2021). The location of QGy.rut-5A was consistent with the position of haplotype H20271, which is associated with variation in yield (Li et al. 2018), and QYld.aww-5A explained 2.3% of the variance (Garcia et al. 2019). SNP S2B_692461029 (TraesCS2B01G495700) affecting the number of grains was localized in the region corresponding to QGy.rut-2B.4 (Pradhan et al. 2019).

Table 5 Main loci responsible for variation of grain yield and stability and corresponding genes, haplotypes, or QTLs

Wheat yield strongly depends on the efficient accumulation of starch in grains. Starch contributes to 60–75% of the total dry weight of the wheat grain (Sawaya et al. 1984). Starch biosynthesis involves enzymes necessary to produce sucrose in the photosynthesis process. Then, sucrose is transported to amyloplasts and metabolized to hexose phosphate. Hexose phosphate is a substrate for the biosynthesis of oil, protein, and starch. During endosperm development, most of the phosphate is used to produce starch. In amyloplasts, hexose phosphate is metabolized to ADP-glucose (Shewry 2009; Thitisaksakul et al. 2012). The activities of four key enzymes involved in sucrose-to-starch conversion, sucrose synthase (SuSase), adenosine diphosphate-glucose pyrophosphorylase (AGPase), starch synthase (StSase), and starch branching enzyme (SBE), were significantly correlated with the grain-filling rate (Zhang et al. 2011). The wheat sucrose synthase 2 gene (TaSus2-2B) affecting grain weight has also been identified (Jiang et al. 2011) on chromosome 2B at 179 Mbp. We found probable sucrose-phosphate synthase 4 (LOC123076775; 3D: 4,469,233.0.4477157) is close to QGy.rut-3D. Two loci coding starch synthase 3 (LOC100136992 2B: 698,067,030.0.698075303; LOC123054641 2D: 577,064,215.0.577073489) are localized in the regions of QGy.rut-2B.2 and QGy.rut-2D.1, respectively.

Comparative analysis of yield-related traits revealed 145 meta-QTLs and candidate genes (Yang et al. 2021). About 40 genes associated with GY and related traits have been cloned (Liu et al. 2012; Rasheed et al. 2016; Nadolska-Orczyk et al. 2017), and functional markers have been converted to competitive allele-specific PCR (KASP) (Rasheed et al. 2016). However, some of these genes have already been established in modern lines. For example, no genetic differentiation was detected around the photoperiod regulation genes Ppd-B1, Vrn-2, and Vrn-3 (Cavanagh et al. 2013). Most accessions carrying the favorable haplotype at these QTLs came from CIMMYT, with 95% of them also carrying the dwarfing allele at Rht-B1 (Garcia et al. 2019). Other genes, TaNMR-1B and TaCOL5-7B, associated with yield increase in biparental populations have been cloned (Kan et al. 2020; Zhang et al. 2022) but not introduced to Polish breeding programs, and no significant MTAs were found in the respective regions.

Nine QTLs colocalized with regions identified in meta-analysis (Table 6) by Yang et al. (2021), including six (2A-2, 2B-2, 2D-2, 4A-2, 5A-3, and 6A-1) associated with kernel number, width, and yield. In addition, region QPhen.rut-3A corresponded to the IWA94 marker (3A 727.9–741.1) of a pleiotropic locus significantly associated with GY and six other yield-related traits (Li et al. 2019).

Table 6 Main loci responsible for variation of days to heading, plant height, thousand kernel weight and lodging, and corresponding genes, haplotypes, or QTLs

Some loci with known genes such as Rht-B1, Rht-D1, Ppd-D1, Ppd-B1, Ppd-A1, Vrn-A1, Vrn-D1, and Vrn-B1 have been routinely employed in marker-assisted selection (Garcia et al. 2019). For this set of genes, only the QPhen.rut-5A.3 locus is located at the position of the Vrn-A1 gene (NC_057806.1, 5A:589,259,335.0.589271309), while no significant effects were found for the remaining genes, which may indicate the fixation of these alleles in modern breeding lines. For example, we found no significant effect of Rht24 localized on the 6A chromosome at position 413.7 Mbp (Würschum et al. 2017).

Markers connected with plant height may also be significant for grain yield. For example, TaRht12 increases the grain number per spike and the effective tiller number and decreases thousand-grain weight (Chen et al. 2013). This gene significantly improved the elite winter wheat lines investigated (QGy.rut-5A, Table 5). Furthermore, markers Ex_c3405_203 (6B: 0.9 Mbp) and Excalibur_rep_c102984_157 (2D: 641.1 Mbp) associated with the lodging score corresponded to QGy.rut-6B (6B: 4.9–9.1 Mbp) and QGy.rut-2D.2 (2D: 633–635 Mbp), respectively (Dreccer et al. 2022).

Achieving optimal plant height is of prime importance for the cultivars’ stability, productivity, and yield potential (Griffiths et al. 2012). Improvement in wheat yield during the Green Revolution was achieved through the introduction of reduced-height (Rht) dwarfing genes. More than 50 loci and 25 height-reducing genes have been detected for wheat (Yang et al. 2021; Muhammad et al. 2021; Mokrzycka et al. 2022). Lodging may contribute to a reduction in grain yields of up to 50% (Stapper and Fischer 1990) and a loss of bread-making quality (Berry et al. 2004). The unpredictable occurrence of lodging has made it difficult for breeders to select for lodging tolerance. Ultimately, diagnostic genetic markers would help improve standability in a breeding program (Dreccer et al. 2022). By adding the semi-dwarfing genes Rht-B1b and Rht-D1b to modern wheat cultivars (Wilhelm et al. 2013; Berry and Berry 2015), the risk of lodging has been cut down. We found no effects from loci in the region of these genes. However, the TaCM (triacetin 3′,4′,5′-O-trimethyltransferase-like) gene responsible for lodging tolerance (Ma 2009) was mapped to chromosome 3B in a position consistent with QPhen.rut-3B.2.

The loci and significant SNP markers from this study can be used to create high-yield varieties by pyramiding the advantageous alleles. The introduction of a few major genes/QTL as fixed effects in GS models increases the accuracy of genomic selection for quantitative traits (Bernardo 2014) if each gene contributes to ≥ 10% of the variance (Sehgal et al. 2020). However, such significant effects of QTLs are rarely identified for complex traits such as GY in a typical GWAS study (Sehgal et al. 2016; 2017). The significant MTAs found in this study show a change in the genetic variation of the tested elite germplasm. To improve yield gains, an optimized set of markers should be used.