Background

Bread wheat is an important crop cultivated on ~ 200 million hectares worldwide, and provides one fifth of the total needs of the global population [1,2,3]. Grain yield (GY) improvement is one of the most challenging objectives in wheat breeding due to the complex genetic architecture and low heritability. The Yellow and Huai River Valleys Wheat Zone (YHRVWZ) is the major wheat-producing region in China, and yield potential in this region has been improved over recent decades [4,5,6]. However, wheat production in the region is facing problems of decreasing groundwater and hence reduced irrigation frequency and decreasing growing area in the northern part, and frequent occurrence of Fusarium head blight in the southern part. Moreover, there is a decline in the rate of increase of yield potential in conventional breeding.

GY is a complex trait, significantly associated with spike number per unit area (SN), kernel number per spike (KNS) and thousand-kernel weight (TKW). However, grain shape, spike architecture, plant height (PH), and flag leaf related traits can also affect GY through effects on photosynthetic intensity, grain filling and dry matter translocation [5,6,7,8]. These traits have higher heritabilities (h2) than GY and are easier to select in small plots at the early stages of breeding programs. Previous studies showed that increased yield potential in the YHRVWZ was largely associated with increased kernels per square meter, biomass and harvest index, and reduced PH [5, 6]. Those improvements were mainly attributed to the use of dwarfing genes (Rht1, Rht2, Rht8 and Rht24) and the 1BL.1RS translocation lines [8,9,10,11,12,13]. However, with the current widespread near-fixation of these genes new variation must be sought. It is now believed that further improvement in yield potential can be achieved only by a detailed understanding of its genetic architecture combined with marker-assisted selection (MAS).

MAS is considered to be a key technique to break through yield bottleneck of conventional breeding for further improvement of yield potential of wheat. The application potential of MAS depends on the number of available genes and tightly linked molecular markers. To date, about 65 genes have been cloned in wheat, among which 40 are associated with GY and related traits [14,15,16,17]. For all cloned genes, around 150 functional markers have been converted to kompetitive allele-specific PCR (KASP) formats convenient for high-throughput genotyping [15]. Although there are many reports on quantitative trait loci (QTL) mapping and genome-wide association study (GWAS) of yield and related trait loci [18,19,20,21,22,23], relatively few outcomes have been applied in selection of wheat lines in actual breeding programs. To enhance the application of MAS, more detailed studies on genetic architecture and identification of related loci for GY should be taken.

Single nucleotide polymorphism (SNP) arrays developed from the transcriptomes of plants and animals [24] providing the most advanced approach in searching for candidate genes for economic traits by QTL mapping or GWAS. The wheat 90 K and 660 K SNP arrays are gradually replacing simple sequence repeat (SSR) markers in genetic studies of yield, quality, disease resistance and stress tolerance [25,26,27,28]. In our previous study, 23 new stable QTL and 11 QTL clusters were identified for 12 yield related traits using high-density linkage maps constructed with the wheat 90 K SNP array in three RIL populations [17]. Wheat 50 K and 15 K SNP arrays now available for selecting important traits in wheat programs, include SNP markers derived from the wheat 35 K, 90 K and 660 K arrays, functional markers of cloned genes, and closely linked markers identified by QTL mapping and GWAS. SNP markers are becoming the main tool for genetic studies and breeding of crop species.

Analysis of GWAS data is based on linkage disequilibrium (LD) and provides a much higher resolution capacity to capture insights into the genetic architecture of complex traits than traditional QTL mapping [29]. Unlike QTL mapping, GWAS uses available germplasm as materials and bypasses the time of developing segregating populations. Moreover, QTL mapping by bi-parental populations focuses on specific traits, whereas a wider range of germplasm can be used in GWAS to phenotype many traits with one cycle of genotyping. Genetic variance of traits in crop species may be caused by a single SNP, but is more often attributed to several SNPs within a haplotype block [30]. Therefore, SNP-GWAS and Haplotype-GWAS can be complementary and verifiable in identifying genes controlling complex traits. SNP-GWAS is commonly applied in genetic studies of crop species, whereas Haplotype-GWAS has been mostly used in detecting heterozygous chromosome segments in cross-pollinated crops [31, 32].

The aims of the present study were to: 1) identify stable loci for GY and related traits using both SNP-GWAS and Haplotype-GWAS based on high-density SNP markers, 2) investigate genetic relationships among yield and related traits, and 3) detect available loci for MAS of traits in breeding for high yield.

Results

Phenotypic evaluation

There was significant and continuous variation in GY and related traits across the diverse panel (Additional file 1: Table S1; Additional file 2: Figure S1). ANOVA showed highly significant effects (P < 0.01) of lines, environments and line × environment interactions on all traits (Additional file 3: Table S2). GY in the panel was moderately heritable (h2 = 0.72), whereas the other 12 traits showed high h2 (> 0.89), indicating that most of the traits were stable and largely determined by genetic factors (Additional file 3: Table S2).

GY showed significant (P < 0.01) and positive correlations with TKW and kernel width (KW), but significant and negative correlations with PH, uppermost internode length (UIL) and flag leaf length (FLL) (Additional file 4: Table S3); SN was significantly (P < 0.01) and negatively correlated with KNS, TKW, KW, spike dry weight (SDW) and flag leaf width (FLW) (r = 0.38 to 0.70); KNS exhibited significant (P < 0.01) and positive correlations with spike length (SL), SDW and FLW (r = 0.36 to 0.64); TKW was significantly (P < 0.01) and positively correlated with kernel length (KL), KW and SDW (r = 0.50 to 0.83).

Marker coverage and genetic diversity

After filtering, 326,570 polymorphic SNPs were employed for GWAS analysis; 10,780 were from the wheat 90 K SNP array and 315,790 came from the wheat 660 K SNP array (Additional file 5: Table S4; Additional file 6: Figure S2a). Among polymorphic SNP markers, 39.7, 49.4 and 10.9% were from the A, B and D genomes, respectively. Chromosome 3B had the most SNP markers (41,439), whereas chromosome 4D possessed the least (2061). The total markers spanned a physical distance of 14,064.8 Mb, with an average marker density of 0.043 Mb per marker. The average genetic diversity and polymorphism information content (PIC) for the whole genome were 0.34 and 0.27, respectively, and the average genetic diversities for A, B and D genomes were 0.34, 0.35 and 0.32, and average PIC were 0.28, 0.28 and 0.26, respectively.

Haplotype composition and coverage

Among all polymorphic SNP markers, 275,000 were assigned to 31,748 haplotype blocks, and 116,555 haplotypes were generated based on 4-gamete LD analyses (Additional file 6: Figure S2b; Additional file 7: Table S5). The D genome had the least haplotype blocks and haplotypes (3384 and 10,579), followed by the A (12,574 and 46,891) and B (15,790 and 59,085) genomes. Like the SNP marker coverage, chromosomes 3B and 4D had the most and least haplotype blocks, respectively. The average number of SNP markers for one haplotype block was 7.9, and the average length was 74.7 kb. For A, B and D genomes, the average numbers of SNP markers were 8.8, 8.7 and 6.1, and the average lengths were 85.9, 93.6 and 44.7 kb, respectively. Haplotype blocks on chromosomes 3B and 5B harbored the most SNP markers (10.8 and 11.0) with maximum length of 112.7 and 97.9 kb, whereas the haplotype blocks on chromosome 4D had the least SNP markers (4.3) and the minimum length (24.8). The range of haplotype block length was 0.001–200.0 kb.

Population structure and linkage disequilibrium

As shown in Liu et al. [26], the germplasm consisted of three subgroups; Subgroup I contained 62 varieties mainly from Shandong province and foreign countries; Subgroup II had 54 varieties mainly from Henan, Anhui and Shaanxi provinces; and Subgroup III comprised 50 varieties mainly from Henan. Average LD for the whole genome was 8 Mb, and for A, B and D genomes, 6, 4 and 11 Mb, respectively.

Genome-wide association studies

Totals of 239 and 248 loci for GY and related traits were identified on all 21 chromosomes using Tassel v5.0 and PLINK, respectively (Additional file 8: Table S6; Additional file 9: Figure S3; Additional file 10: Figure S4). In SNP-GWAS 18, 13, 20, 22, 14, 23, 19, 10, 21, 28, 20, 16 and 15 loci were detected for GY, SN, KNS, TKW, KL, KW, SL, SDW, heading date (HD), PH, UIL, FLL and FLW, respectively; in Haplotype-GWAS the corresponding numbers were 20, 13, 9, 27, 13, 32, 11, 13, 21, 27, 27, 24 and 11. In both methods, the D genome possessed the lowest number of loci, consistent with its lowest diversity. One hundred and twenty loci were common in SNP-GWAS and Haplotype-GWAS, 49, 56 and 15 in the A, B and D genomes, respectively (Table 1).

Table 1 Loci for grain yield and related traits in the diverse panel identified by both SNP-GWAS and Haplotype-GWAS and comparisons with previous studies

GY and yield components

Twelve common loci for GY were identified on chromosomes 1A (2), 1B (2), 2A, 2B, 2D, 3A, 3B, 3D, 5A and 5B, with single loci explaining 6.9–17.7% and 9.1–22.6% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. Seven loci, on 1A (AX_110387060 and AX_110418502), 1B (AX_110508372 and AX_109820171), 2D (AX_109941480), 3B (AX_109881378), and 3D (AX_95257733) were detected in four environments and best linear unbiased estimation (BLUE) values by both methods. The 1A (AX_110418502) and 1B (H4268) loci explained the largest of phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively.

Six common loci for SN were detected on chromosomes 5B (2), 6B (2), 6D and 7D, explaining 7.1–17.1% and 9.1–23.3% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The 5B locus (IWB56499) was significant in three environments and BLUE value, whereas the 6D locus (AX_110652999) explained the largest phenotypic variance (7.2–17.1%) in SNP-GWAS. Loci on chromosomes 5B (H22717), 6B (H26344), 6D (H27181) and 7D (H31325) were identified in four environments and BLUE values, among which the 7D locus (H31325) accounted for the largest of phenotypic variance (9.8–23.3%) in Haplotype-GWAS.

Nine common loci for KNS were found on chromosomes 1A (2), 2A, 2D, 3A (2), 5B (2) and 7A, accounting for 7.1–17.1% and 9.1–15.8% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. In SNP-GWAS, the loci on chromosomes 1A (AX_108737858) and 2D (IWB57054) were identified in all six environments and BLUE values; the 2D locus (IWB57054) accounted for the largest phenotypic variance (7.4–17.7%). In Haplotype-GWAS, the loci on chromosomes 1A (H322) and 5B (H21713) were significant in all six environments and BLUE values; the 1A locus (H322) explained the largest phenotypic variance (9.2–15.8%). Four loci, including 1A (AX_111579941), 3A (AX_110657474), 5B (AX_109537496) and 7A (AX_89571435), were identified in five or more environments and BLUE values in both methods and were therefore stable.

Twelve common loci for TKW were identified on chromosomes 1B, 2A, 2B, 4B (3), 5A, 5B, 6B (2), 6D and 7D, explaining 6.7–13.0% and 9.2–27.0% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. In SNP-GWAS, the 4B locus (AX_110713957) and 7D locus (AX_109927697) were significant in at least five environments and BLUE values; in Haplotype-GWAS, all the loci were significant in at least five environments and BLUE values. The 5A (AX_110958315) and 2B (H7872) loci explained the largest phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively.

Kernel shape related traits

Six common loci for KL on chromosomes 1B (2), 2A, 3B, 5B and 5D explained 7.0–14.2% and 9.2–15.4% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The 2A locus (IWB32119) was significant in all six environments and BLUE value, whereas the 5D locus (AX_111122970) was identified in five or six environments and BLUE value by both methods.

Fifteen common loci for KW on chromosomes 1A (2), 2A (2), 2B (2), 3A, 3D, 4A, 4B, 5A (2), 5B, 6B and 7D accounted for 6.8–15.0% and 9.1–36.7% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The 2B locus (AX_111634754) was significant in all six environments and BLUE value with the largest contribution to phenotypic variance in both methods. The 2B (AX_111819405) and 3D (IWB17930) loci were significant in five environments and BLUE values in SNP-GWAS, and significant in all six environments and BLUE values in Haplotype-GWAS.

Spike related traits

Eight common loci for SL were identified on chromosomes 2B, 5A (3), 5B (2), 7B and 7D, explaining 6.7–15.0% and 9.0–20.0% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. Locus (IWB71567) on chromosome 7B was significant in five or more environments and BLUE value in both methods; the 7D locus (AX_110645784) explained 7.6–15.0% and 10.7–20.0% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively.

Five common loci for SDW detected on chromosomes 1A, 3A, 4B, and 5B (2) explained 7.0–19.0% and 9.1–18.5% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The 3A locus (IWA94) was significant in all four environments and BLUE value in both methods; the 5B locus (AX_111183518) was stable across three or four environments and BLUE value in both methods.

Heading date

Eight common loci for HD on chromosomes 2A (3), 2B, 5B, 7A (2) and 7B accounted for 6.6–13.1% and 9.1–22.0% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The locus (IWB75191) on chromosome 7B was significant in all six environments and BLUE value in both methods, whereas locus (AX_111037158) on chromosome 2A was stably detected in five and six environments and BLUE value in SNP-GWAS and Haplotype-GWAS, respectively.

Plant height related traits

Fourteen common loci for PH were identified on chromosomes 1A, 1B (2), 2A (2), 3A, 3B, 4D, 5A (2), 5B, 6B (2) and 7A, explaining 6.7–30.8% and 9.0–38.0% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The loci on chromosomes 1A (AX_109449226), 3B (AX_109413472) and 5A (AX_110446653) were significant in all six environments and BLUE values, whereas the other five loci on chromosomes 1B (AX_94564150 and AX_109820171), 3A (AX_111577195), 4D (AX_108916749) and 7A (AX_109384874) were stably identified in five or six environments and BLUE values in both methods; the 1B (AX_109820171) locus was the most significant, explaining 6.8–30.8% and 9.5–38.0% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively.

Twelve common loci for UIL were detected on chromosomes 1A (2), 1B (2), 3A, 5A, 6B (3), 6D (2) and 7B, with single loci explaining 6.7–16.4% and 9.1–24.8% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. Five loci on chromosomes 1A (AX_109449226), 5A (IWA5929), 6B (IWB12568 and AX_86165710) and 6D (AX_109331000) were identified in all four investigated environments and BLUE values by the two methods. Locus (AX_109820171) on chromosome 1B had a large effect on phenotypic variance in both methods.

Flag leaf related traits

Eight common loci for FLL on chromosomes 1A, 2A (2), 2B, 3A, 5A, 6B and 6D explained 6.9–19.6% and 9.0–29.1% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The 2A locus (AX_109880304) was significant in four or five environments and BLUE value and presented the largest effect on phenotypic variance in both methods. The 1A locus (AX_109621606) was also detected in four environments and BLUE value in both methods.

Five common loci for FLW were identified on chromosomes 1A, 3B, 5B (2) and 6B, accounting for 6.9–11.4% and 9.1–19.7% of the phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively. The locus on chromosome 5B (AX_109519234) was significant in four or five environments and BLUE value in both methods, whereas 1A (AX_111540798) and 3B (AX_111655083) loci explained the highest phenotypic variances in SNP-GWAS and Haplotype-GWAS, respectively.

Pleiotropic loci

Twelve pleiotropic loci were associated with three or more traits on chromosomes 1A, 1B (2), 2A (2), 2B, 3A, 5A (2), 5B (2) and 6D based on the common loci detected by both methods (Table 2). The interval 714.4–725.8 Mb on chromosome 3A was associated with GY, KNS, KW, SDW, PH, UIL and FLL, showing a significant effect on GY. Seven pleiotropic loci were associated with GY, among which four were related to KW and five to PH or UIL. Three SN loci on chromosomes 5B (IWB56499 and AX_109936345) and 6D (AX_110652999) were located in pleiotropic loci; four HD loci on chromosomes 2A (AX_111037158 and AX_111579921), 2B (AX_111634754) and 5B (IWB56499) were also located in pleiotropic loci; these loci were both accompanied with TKW or KW loci. Finally, nine pleiotropic loci for TKW or KW and seven loci for PH or UIL should be crucial in determining GY. Of all common loci identified by both methods, more than half were co-localized.

Table 2 Distribution of pleiotropic loci associated with three or more grain yield related traits on wheat chromosomes

Relationships between trait performances and number of alleles for increasing phenotypic values

For most traits, ranges in the number of alleles for increasing phenotypic values across the panel were large (Table 3). The average number of alleles for increasing GY was 10.0. Compared with the higher numbers of alleles for increasing TKW, KL, KW and FLW, those for SN, KNS, SL, SDW, HD, PH, UIL and FLL were lower.

Table 3 Number of alleles for increasing phenotypic values of grain yield and related traits in the diverse panel

Favorable alleles at each locus for GY exhibited significant and positive effects on phenotypic values (Fig. 1). Effects of number of alleles for increasing phenotypic values for each trait were also estimated (Fig. 2), and the results showed that the phenotypic traits were dependent on the number of alleles for increasing phenotypic value.

Fig. 1
figure 1

Effects of each locus on the phenotypic values for grain yield

Fig. 2
figure 2

Effects of the number of alleles for increasing phenotypic values of grain yield and related traits. GY, grain yield; SN, spike number per square meter; KNS, kernel number per spike; TKW, thousand-kernel weight; KL, kernel length; KW, kernel width; SL, spike length; SDW, spike dry weight; HD, heading date; PH, plant height; UIL, uppermost internode length; FLL, flag leaf length; FLW, flag leaf width

Discussion

Advantages of two methods of GWAS using high-density SNP markers

SNP arrays based on Next Generation Sequencing Technology permit identification of many SNP markers, and represent very high throughput and multiple genotyping compared with traditional molecular markers [24]. In differing from QTL mapping, GWAS is performed by significance testing between phenotypic values and single markers or haplotype blocks comprised of contiguous SNP markers with similar genotype. The accuracy of GWAS results thus depends on the coverage of markers used for analysis. In the present study, 326,570 SNP markers from the wheat 90 K and 660 K SNP arrays were used for GWAS of GY and related traits, with a physical distance of 0.043 Mb per marker. The average LD for the whole genome was 8 Mb, and the high-density of SNP markers ensured multiple markers in each haplotype block and high efficiency in identifying significant loci.

SNP are very common in the genomes of most crop species and result in a variety of genetic variances. However, genetic variance in crops can sometimes be caused by single SNP, but mostly there are numerous closely linked SNPs [30]. In order to avoid the disadvantage of SNP-GWAS in detecting genetic multiple variances caused by numerous SNP and false positives identified by Haplotype-GWAS, both methods were used in the present study to identify loci with significant effects. As already mentioned above, 275,000 of a total 326,570 SNP markers were sorted into 31,748 haplotype blocks, remaining 51,570 single SNP markers. A total of 239 and 248 significant loci were detected and about half the loci were common in both methods. This indicated that the detection intensity of SNP-GWAS and Haplotype-GWAS differs between chromosome positions. Loci identified in three or more environments in both methods were regarded as the main loci affecting GY and related traits.

Comparison with the QTL identified in previous studies

GY and related traits are basic observable and measurable agronomic traits extensively reported in the literature. Being limited to low-density molecular markers, significant influence by environments, and likely presence of linkage drag, marker loci for GY and related traits identified by QTL mapping or GWAS are seldom used in wheat breeding programs. In the present study, associations of GY and related traits with single SNPs and haplotype blocks were conducted separately. Loci identified by both methods were compared with QTL previously reported on physical or linkage maps.

GY and its components

GY related QTL have been reported on all 21 wheat chromosomes [18, 23, 33,34,35,36,37,38]. Azadi et al. [23] reported a GY QTL on chromosome 1A tightly linked with SSR marker gwm357, which was also located between the two GY QTL by Cuthbert et al. [18] and Huang et al. [33]. The 1A locus (AX_110418502) for GY is about 0.21 cM from gwm357 on the consensus linkage map [39], indicating that these two loci are likely to be the same. Reif et al. [35] identified a GY QTL on chromosome 5A linked with SSR marker barc151, at a similar position to the present GY locus (AX_110523824). The remaining loci are likely to be new.

Numerous reports indicate that SN is controlled by polygenes and significantly influenced by environment. Nine SN QTL were recently mapped using the wheat 90 K SNP array on three RIL populations [17]. QSN.caas-3AL.1 and QSN.caas-6AL were at similar positions to the QTL reported in Lee et al. [36] and Gao et al. [40], whereas the effect of QSN.caas-4BS was contributed by Rht-B1b. However, the six SN loci detected in this study are likely to be at different positions to the QTL reported previously.

Azadi et al. [23] detected KNS QTL on chromosomes 1A, 3A and 5B, linked with DArT markers wPt-665,590, wPt-5133 and wPt-3661, respectively; the 1A QTL is about 1.5 cM from the KNS locus (AX_108737858) identified in this study and they are likely to be the same; the 3A QTL is about 2 cM from the KNS locus AX_108992368 and close to a QTL mapped by Gao et al. [40]; the 5B QTL is about 6.5 cM from the KNS locus AX_109537496 and therefore might be different. Zhang et al. [41] identified SSR markers wmc63 and gwm213 significantly associated with KNS on chromosomes 2A and 5B, respectively; wmc63 is about 2 cM from the present 2A locus IWB45503 and close to a QTL reported by Kumar et al. [34] and Yao et al. [42]; gwm213 is at the same position as the present KNS locus AX_109538915. In addition, the locus AX_111579941 on chromosome 1A is about one LD from a QTL reported in Wang et al. [43]. The stable loci on chromosomes 3A (AX_110657474), 5B (AX_109537496) and 7A (AX_89571435) identified in five or more environments and BLUE values by both methods are likely to be new.

TKW locus AX_109917592 on chromosome 6B is within the confidence interval of QTKW.caas-6BL detected in the D × S (Doumai × Shi 4185) population in Li et al. [17]. The 7D locus (AX_109927697) is at the similar position to QKL.caas-7DS located in the G × Z (Gaocheng 8901 × Zhoumai 16) population [17] and QGw.ccsu-7D.1 reported by Mir et al. [44]. TKW locus AX_108769612 on chromosome 5B is at the same position as loci for KL, KW and TKW detected by Chen et al. [45], Mohler et al. [46] and Sun et al. [27], respectively, indicating that this should be an important locus in determining kernel weight. Wu et al. [47] reported a locus affecting both TKW and KL on chromosome 1B, located about one LD from the present TKW locus AX_111147652. Chromosomes 5A locus AX_110958315 is about one LD from a TKW QTL reported by Gao et al. [40], whereas 4B locus AX_110713957 is at a similar position to a QTL reported in Liu et al. [48]. The other six loci are likely to be new.

Kernel shape related traits

Sajjad et al. [49] cloned TaFlo2-A1 for TKW on chromosome 2A, at the same position as the present stable KL locus IWB32119. 1B locus AX_108849700 is within the confidence interval of QKL.caas-1BL mapped in the L × Z (Linmai 2 × Zhong 892) population in Li et al. [17] and within one LD of the significantly associated SNP marker tplb0043a07_1411 for TKW [48]. KL locus IWB50649 on chromosome 5B is very close to QTL reported by Azadi et al. [23] and Wu et al. [47], and within the interval of a TKW QTL mapped by Zhai et al. [50]. Locus AX_111122970 on chromosome 5D stably detected in five or six environments and BLUE value by both methods is probably new.

The co-localized KW and TKW locus AX_110958315 is at the same position as a TKW QTL on chromosome 5A reported by Gao et al. [40]. Another KW locus AX_109396082 co-localized with TKW locus AX_109927697 on chromosome 7D is at a similar position to QTL reported in Mir et al. [44] and Li et al. [17]. Wu et al. [47] reported three TKW or KL QTL on chromosomes 1A, 4A and 5A, which are within one LD from the KW loci IWB7676, AX_110046841 and AX_110523824, respectively; 4A locus AX_110046841 is also close to a TKW QTL mapped by Gao et al. [40]. The KW loci on chromosomes 1A (IWB6999), 2A (AX_111037158) and 3A (AX_111047166) are about one LD from TKW QTL reported by Mir et al. [44], Yao et al. [42] and Liu et al. [48], respectively. Locus IWB20926 on chromosome 5B is about 2 cM from DArT marker wPt-5851 linked to a TKW QTL in Azadi et al. [23]. The stable loci on chromosomes 2B (AX_111634754 and AX_111819405) and 3D (IWB17930) identified in five or six environments and BLUE values by both methods are likely to be new.

Spike related traits

QSL.caas-5AL.2 identified in the G × Z population [17] is at a similar position to the present SL locus AX_109367907. Sun et al. [27] reported SNP marker BS00022060_51 associated with SL on chromosome 2B. This gene is about one LD from the SL locus AX_109985540, indicating they are likely to be the same. Liu et al. [48] mapped a SL QTL on chromosome 5A about one LD from SL locus AX_110523824 in this study. The stable locus IWB71567 on chromosome 7B detected in five or six environments and BLUE value by both methods is likely to be new.

Compared with other traits there are few reports on QTL mapping of SDW. Li et al. [17] mapped 10 SDW QTL; among them QSDW.caas-6BL and QSDW.caas-7BL are at similar positions to SNPs RAC875_c31299_1302 and BS00055584_51 identified by Valluru et al. [28]. All five SDW loci identified in this study appear to be new.

Heading date

Le Gouis et al. [51] reported DArT markers wPt-1499, wPt-1409 and wPt-4796 associated with HD on chromosomes 2A, 5A and 7A, respectively; these three markers are close to the HD loci AX_109964711, IWB20926 and AX_111660137, respectively, on the consensus linkage map [39]. As the majority of varieties in the present study were from the YHRVWZ with similar vernalization and photoperiod characteristics, and no variation associated with known Vrn and Ppd genes was detected. Stable loci on chromosomes 2A (AX_111037158) and 7B (IWB75191) detected in most environments and BLUE values by both methods are likely to be new.

Plant height related traits

Rht-D1b is widely present in wheat varieties in YHRVWZ [6]. The PH locus AX_108916749 on chromosome 4D is at the same position as Rht-D1 [52], indicating that the effect on PH is from Rht-D1b, and is the same as QTL or loci reported by Li et al. [17], Sun et al. [27] and Gao et al. [40]. Loci AX_110988136 and AX_94494373 on chromosome 2A are at similar positions to QUIL.caas-2AS.1 and QPH.caas-2AL (co-localized with QUIL.caas-2AL), respectively [17]. Cui et al. [19] identified QTL for PH or UIL on chromosomes 3A and 3B; these QTL are close to the present PH loci AX_111577195 and AX_109413472, respectively. 3B locus AX_109413472 is about 14 cM from Rht5 [7] and therefore should be different. 5A locus IWA2646 is about one LD from a QTL in Li et al. [53], and about 25 Mb and 8.9 cM from Rht12 [54], respectively, on the physical and consensus linkage maps [39]. 5B locus AX_108921249 is about 2 Mb from Vrn-B1 [55], but there is no reported relationship between vernalization response and PH. The five loci identified in 1A (AX_109449226), 1B (AX_94564150 and AX_109820171), 5A (AX_110446653) and 7A (AX_109384874) identified in five or more environments and BLUE values by both methods are likely to be new.

The UIL locus AX_111610555, co-localized with PH locus AX_111577195, is likely to be the same as a QTL on chromosome 3A for both PH and UIL reported by Cui et al. [19]. Another UIL locus (AX_109331000) on chromosome 6D is about one LD from a QTL associated with PH and third internode length reported in Cui et al. [19]; they are likely to be the same. Apart from 3A locus, the remaining six loci co-localized with PH loci are likely to be new.

Flag leaf related traits

Wu et al. [56] mapped a FLL QTL on chromosome 6D that overlapped with FLL locus AX_110876641. They also reported a pleiotropic locus for FLW and flag leaf angle at about one LD from the present 5A FLL locus IWB4576. Another FLL QTL linked with the SSR marker barc318 identified on chromosome 2B [57] is about 1.2 cM from the present FLL locus AX_111027654 based on the consensus linkage map [39]. Loci on chromosomes 1A (AX_109621606) and 2A (AX_109880304) that were stable in four or more environments and BLUE values by both methods are probably new.

A FLW QTL mapped on chromosome 6B by Wu et al. [56] is at the same position as AX_108771909, and are probably the same gene. Two stable loci on chromosomes 1A (AX_111540798) and 5B (AX_109519234) identified in four or five environments and BLUE values in both methods are likely to be new.

Among the 120 loci for GY and related traits, 42 could be the same as QTL reported in previous studies, whereas the remaining are likely to be new. Stable loci identified in both GWAS and QTL mapping showed that they are widespread in varieties. Our results indicated that the methods of GWAS used in the present study were reliable and efficient in detecting loci for GY and related traits.

Genetic relationships among grain yield and related traits

High-yielding varieties should have good adaptability to prevailing environments, strong resistance to abiotic and biotic stresses, and highly coordinated agronomic traits. Previous studies have showed that improvements in agronomic traits made significant contributions to increased yield potential [4,5,6]. Many studies have reported interaction effects or genetic linkages among yield related traits, especially in regard to the reduced height loci Rht-B1 and Rht-D1 [17, 18, 40, 41, 58]. In the present study, 12 pleiotropic loci involving three or more traits were identified, and more than half of the common loci were co-localized. Previously, three QTL clusters associated with yield related traits were detected at different positions on chromosome 3A [40, 41, 58]; among these the QTL cluster detected by Xu et al. [58] overlapped with the pleiotropic locus IWA94 in the present study. Many studies have reported that chromosome 5A carries productivity and adaptability related genes [18, 33, 59, 60]. Li et al. [17], Cuthbert et al. [18], Zhang et al. [41] and Liu et al. [48] all reported QTL clusters for yield related traits at different positions on chromosome 5A; however, they are likely to be different from two pleiotropic loci detected in this study. Another locus on chromosome 1B related to GY, PH and UIL is about 15 Mb from the QTL cluster for KNS, KL, PH and FLW identified in Li et al. [17].

Relationships between GY and yield components are discussed in several publications [18, 58, 61,62,63]. Many studies demonstrated that GY is significantly correlated with SN and KNS. For example, by unconditional and conditional QTL analysis, Xu et al. [58] found that spike number per plant and KNS have larger effects on GY than TKW. Miralles and Slafer [63] reviewed reports on factors influencing GY and concluded that increased GY was associated with increased grain number, but associated with a negative relationship between grain number and grain weight. Huang et al. [61] and Li et al. [62] reported that GY was significantly correlated with kernel size. However, in the present study, co-localization of related loci and phenotypic correlations showed that TKW and KW were more highly correlated with GY than were SN and KNS. Recently, McIntyre et al. [64] detected six putative QTL that increased grain weight and co-located with QTL for SN, KNS and harvest index, and three putative QTL for increased KNS co-located with QTL for increased grain weight, fewer spikes and earlier flowering. In this study, three loci associated with SN and TKW showed opposite effects on these traits due to negative correlation.

Keyes et al. [65] reported that plants with the Rht-B1b, Rht-B1e and Rht-D1b alleles are GA-insensitive, and the reduced PH was induced by decreased sensitivity of their vegetative tissues to endogenous gibberellin (GA). Chebotar et al. [66] pointed out that both GA-sensitive (Rht8) and GA-insensitive (Rht-B1 and Rht-D1) dwarfing alleles had effects on almost all investigated traits. Our earlier study on QTL mapping of yield related traits showed that the Rht-B1 and Rht-D1 loci, as well as other PH QTL, had significant influences on other traits [17]. In the present study, more than half of the PH and UIL loci were co-localized with other traits, indicating that genes underlying have multiple effects on other traits, including GY.

The growth of wheat is controlled by many genes expressed at different growth stages. Heading and flowering represent a node of spike development and grain-filling, and are affected by environmental conditions as well as the many genes associated with plant development [67]. As a result, HD is crucial in optimising agronomic traits like kernel and spike related phenotypes. However, in the present study, HD exhibited no significant correlations with traits other than FLW. Through co-localization, early heading is likely to benefit kernel development at lower temperatures.

Flag leaves account for 45–58% of the total photosynthetic activity of the plant and contributed 41–43% of the carbohydrates required for grain-filling [68, 69]. Previously, Li et al. [17] found that FLW was important in determining KNS. In the present study, FLL was negatively correlated with GY, whereas it was positively correlated with PH and UIL. However, only few FLL loci were co-localized with loci for GY, PH or UIL. FLW was negatively correlated with SN, but positively correlated with KNS and SDW.

Potential implications in wheat breeding

The YHRVWZ is the major wheat growing area in China, producing ~ 65% of national production [4]. Comparison of the 20 highest-yielding and other varieties in the germplasm panel showed that KNS, TKW, KW, SDW and FLW in the high-yield group were 2.0, 5.6, 2.6, 7.0 and 4.1%, respectively, higher than the other group, whereas PH, UIL and FLL were 3.5, 10.1 and 5.6% lower. The numbers of alleles for increasing phenotypic values for each trait assessed in the panel were in agreement with the results mentioned above and in favor of Xiao et al. [5] and Gao et al. [6]. However, with the anomaly change of climate and decreased use value of germplasm, yield potential of new varieties is increasing slowly in this area. As a result, new methods and technologies that assisted in selection are essential for further improvement of GY.

High-yielding lines are difficult to select in the early stages of breeding programs as significantly influenced by other traits and environments. Li et al. [17] showed that FLW can be used to select lines with large KNS. In the present study, UIL showed a significant, negative correlation with GY, indicating that larger UIL was associated with decreased carbohydrate transportation to grain. FLL, significantly and positively correlated with UIL, also showed a significant, negative association with GY. SN and KNS were significantly and negatively correlated with each other, as reflected by FLW. Larger FLW was significantly associated with larger KNS and smaller SN in the same variety. Therefore, selection for shorter UIL and FLL would be helpful in selection for higher GY of wheat lines, whereas FLW is convenient to reflect SN and KNS.

Favorable alleles at each locus affecting GY exhibited positive effects on phenotypic values. As a result, the GY loci are valuable for selecting high-yielding varieties in breeding programs. The alleles for increasing phenotypic values presented significant additive effects on each trait, indicating that pyramiding favorable alleles is feasible to improve trait performances using the loci listed in Table 1. Besides, the 12 pleiotropic loci are important in determining GY and related traits, especially the loci that related to GY; the eight loci for TKW (2), KL, KW, SL and PH (3) that at similar positions with the QTL identified in our previous study are also credible. As GY related traits are mostly controlled by polygenes with small effect each, a genome-wide selection would be more powerful in gene discovery and pyramiding breeding with high-density genetic markers or genotyping by sequencing in future. However, MAS may be more feasible as long as only a few QTL need to be tracked in wheat breeding.

Among the 11 varieties with GY potential higher than 8200 kg ha− 1, Luyuan 502, Luomai 21, Yannong 18, Shannong 20, Zhongmai 875 and Wanmai 52 possess all 12 favorable alleles for GY. They are good parents to develop new high-yielding varieties. Four varieties, Lumai 8, Zhou 8425B, Zhongmai 875 and 85 Zhong 33 have large TKW, with more than 10 favorable alleles for that trait. These varieties should be valuable germplasms to develop large kernel varieties and for cloning genes related to TKW. Lankao 906 has large spikes with an average KNS of 60.2 and possesses all the favorable alleles identified in the present study for KNS. As KNS in the YHRVWZ is currently not large, this variety can be used to improve KNS. The superior germplasm and favorable alleles of markers identified or confirmed in this study can be used in breeding new high-yielding varieties.

Conclusions

In the present study, SNP-GWAS and Haplotype-GWAS for GY and related traits, were performed in a diverse panel of 166 varieties with the wheat 90 K and 660 K SNP arrays. One hundred and twenty loci were identified by two methods, and 78 of these are likely to be new. Varieties with higher yield potential identified in the study can be used as parents in breeding programs aimed to accumulate further favorable alleles by marker-assisted selection. Our study proved that two GWAS methods with high-density SNP markers were reliable in identifying genes for GY and related traits, and provided new insight into the genetic architecture of GY.

Materials and methods

Plant materials and field trials

The diverse panel used in the present study contained 166 varieties, comprising 144 accessions from the YHRVWZ of China, and 22 accessions from other countries [26].

The diverse panel was grown at Anyang in Henan province and Suixi in Anhui province during the 2012–2013 and 2013–2014 cropping seasons, and at Shijiazhuang in Hebei province and Suixi in Anhui province during the 2014–2015. A randomized complete block design with three replicates was employed in field trials. Each plot comprised three 1.5 m rows spaced 20 cm apart, with 50 plants in each row. Agronomic management was performed according to local practices at each location. All wheat accessions are deposited in the National Genebank of China, Chinese Academy of Agricultural Sciences, and available after approval. All wheat varieties were collected in accordance with national regulations, and the experiments comply with the ethical standards and legislations in China.

Phenotyping and statistical analysis

Thirteen phenotypic traits, GY, SN, KNS, TKW, KL, KW, SL, SDW, HD, PH, UIL, FLL, and FLW were assessed in the diverse panel (Additional file 1: Table S1).

All plants were harvested in each plot at physiological maturity and GY as kg ha− 1 were measured when the moisture declined to 14%. Investigation of the other 12 traits and statistical analyses followed Li et al. [17]. The phenotypic traits GY, KNS, TKW, KL, KW, SL, HD, and PH were assessed in all six environments, whereas data for FLL and FLW and those for SN, SDW and UIL were obtained in five and four environments, respectively. The phenotypic values in each environment and BLUE values were used for GWAS.

Genotyping, quality control and construction of the physical map

The diverse panel was genotyped using both the wheat 90 K SNP and 660 K SNP arrays [26]. Minor allele frequency (MAF), genetic diversity and PIC were calculated using PowerMarker v3.25 (http://statgen.ncsu.edu/powermarker/). To avoid spurious alleles, SNP with missing data > 20% and MAF < 0.05 were removed. Flanking sequences of SNPs were used to blast against the CSS database (IWGSC RefSeq v1.0, https://urgi.versailles.inra.fr/blast_iwgsc/blast.php) to identify their positions on the physical map. Markers from the two SNP arrays were ordered based on their positions on chromosomes and integrated into a common physical map for GWAS.

Haplotype analysis

Based on 4 gametes and default parameters as used by the Haploview 4.2 software package (http://www.broadinstitute.org/haploview/haploview), genome-wide haplotype blocks were constructed with PLINK. The number of haplotypes, genetic length (bp) for each block, and the number of tag SNPs based on the ‘solid spine’ of LD were also provided (Extend spine if D′ > 0.8). Haplotype frequency was calculated using a custom Perl script and haplotypes with low frequency (F <0.05) were removed.

Population structure and linkage disequilibrium

The SNP markers and estimated methods for population structure and LD were the same as in Liu et al. [26]. For population structure, 2000 polymorphic SNP markers evenly distributed on all 21 chromosomes were analyzed in Structure v2.3.4 [70] (http://pritchardlab.stanford.edu/structure.html). PCA and NJ trees were estimated using the software Tassel v5.0 [71] and PowerMarker v3.25 [72] (http://www.maizegenetics.net), respectively, to verify the results.

A total of 12,324 evenly distributed SNP markers were chosen to calculate LD for the A, B and D and entire genomes using the full matrix and sliding window options in Tassel v5.0 [73].

Genome-wide association studies

SNP-GWAS and Haplotype-GWAS were used to identify the associations between phenotypic and genotypic data. For SNP-GWAS, the mixed linear model (MLM) in Tassel v5.0 was used including kinship matrix and population structure. The kinship matrix was treated as a random effect and calculated by the Tassel v5.0 software, whereas the subpopulation data was considered a fixed effect and estimated by Structure v2.3.4 in MLM analysis. The P value indicated the degree of association between a SNP marker and a trait, and the R2 was the variation explained by the significantly associated markers. As the Bonferroni-Holm correction for multiple testing (α = 0.05) was too conserved for the traits in the present study, markers with an adjusted -log10 (P-value) ≥ 3.0 were regarded as significant for all traits. For Haplotype-GWAS, PLINK was used in consideration of population structure. According to the results, markers with -log10 (P-value) ≥ 4.0 were considered to be significant. Manhattan plots for both methods were drawn using the ggplot2 code in R Language with the P value estimated between the marker and trait in Tassel v5.0 and PLINK. In both cases loci identified in one-half or more environments were taken as stable.

Loci position comparison

For each trait, significant SNP markers within one LD on the same chromosome and identified by the same method were considered to represent one locus. Overlapping loci identified by the two methods for same trait were regarded as common loci. For loci or QTL reported in previous studies, two steps were followed to decide whether currently identified loci were the same as previously found. Firstly, the sequences of the tightly linked or significant markers of the QTL or loci were used to blast against the CSS database (IWGSC RefSeq v1.0, https://urgi.versailles.inra.fr/blast_iwgsc/blast.php). If the marker was less than one LD from the locus for the same trait detected in the present study, they were considered to be the same. Secondly, the consensus linkage map constructed by Maccaferri et al. [39] was used to compare different types of markers. Therefore, loci or QTL were considered to be the same if the tightly linked or significantly associated markers were less than 2.1, 1.2 and 3.9 cM from each other on the A, B and D genomes, respectively.

Effects of alleles on grain yield and related traits

For each common locus, the most significant SNP markers and haplotypes were chosen as representative markers and haplotypes. The effects of each locus on phenotypic values for GY and the effects of the number of alleles for increasing phenotypic values for each trait were estimated based on the representative markers using R Language.