Introduction

The industrial revolution and anthropogenic activities led to the widespread distribution of heavy metals. Heavy metal concentrations that are over allowed limits can result in a major ecological disaster (Facchinelli et al. 2001; Sofianska et al. 2013). The accumulation of heavy metals and metalloids in soils can result from a variety of sources, including emissions from rapidly growing industrial areas, mine tailings, the disposal of high metal wastes, leaded paint and gasoline, the land application of fertilizers, animal manures, sewage sludge, pesticides, wastewater irrigation, coal combustion residues, spills of petrochemicals, and atmospheric deposition (Khan et al. 2008; Zhang et al. 2010). Lead (Pb), tin (Sn), chromium (Cr), arsenic (As), zinc (Zn), cadmium (Cd), copper (Cu), mercury (Hg), and nickel (Ni) are the heavy metals most typically discovered in contaminated sites. In soil ecosystems, heavy metals, unfortunately, exist forever since they are biologically inert elements. Metals can enter the food chain from plants, increasing the likelihood that they will be hazardous to both human and animals (Roy and Mcdonald 2015; Maiti et al. 2016).

Dangerous heavy metals such as lead (Pb) and tin (Sn) are present in soils and irrigation water, including wastewater and Nile River water, at levels higher than those considered safe for human health (Elgharably and Mohamed 2016; El-Bady and Metwally 2019; Mekky et al. 2019; Mourad et al. 2021b). High level of Pb was reported in the soils of Asyut, El-Mina and Delta (Salman et al. 2019). Recently, the presence of high level of Sn in water irrigation and agriculture soils has been reported worldwide (Noubissié et al. 2016, 2017). In addition to influencing different morphological, physiological, and biochemical processes in plants, Pb in soil poses a threat to human health through food chains (Xiong et al. 2014; Khan et al. 2021).

Wheat grain yield improvement is one of the most challenging breeding objectives due to the complex genetic architecture and low heritability of grain yield, and it is typically the most important breeding objective (Eltaher et al. 2021a, 2022). It is very sensitive to heavy metal stress (Ghani et al. 2014). It was reported that when wheat plants absorbed heavy metals, the stress-induced different responses in plants and affected all growth stages, biochemical responses, and yield losses in wheat. Pb and Sn are important heavy metals that reduced the morphological traits, such as length (root and shoot), shoot fresh and dry matters, and the number of tillers per plant (Sindhu et al. 2015). Moreover, they affect wheat production and productivity by reducing kernel traits such as length, diameter, width, and weight. Furthermore, both metals were found to differentially influence the physiological processes in wheat. Both metals were found to increase hydrogen peroxide (H2O2) and lipid peroxidation (MDA) concentrations in the flag leave and decreasing the guaiacol peroxidase (POD) concentration. Sn was found to slightly increases ascorbate peroxidase (APX) and catalase (CAT) in the flag leaf, such response was not found under Pb conditions (Mourad et al. 2021b).

The initial step in the safe exploitation of Pb and Sn polluted soil will be the development of new cultivars with tolerance to Pb and Sn toxicity (Grant et al. 2008; Ali et al. 2013; Gupta et al. 2013; Zhang et al. 2020). This can be achieved through screen wheat germplasm under both stresses. In our previous study by Mourad et al. (2021b), huge genetic variation in Pb and Sn tolerance was found among a set of 103 highly diverse spring wheat genotypes. We studied the effect of Pb and Sn on kernel traits such as diameter, width, length, and weight. Such high genetic variation was used for the selection of Pb and Sn stress tolerance. Two sets of twenty genotypes each were found to be tolerant to Pb, and Sn, respectively. These genotypes can be used for crossing in future breeding programs to produce wheat cultivars with high tolerance to Pb and Sn stresses. However, traditional breeding programs are time-consuming and laborious (Salem and Sallam 2016; Mourad et al. 2019; Sallam et al. 2019; Ahmed et al. 2021). Integrating the genetic analyses along with the breeding program will accelerate the genetic improvement of tolerance to Pb and Sn stresses. Genome-wide association study (GWAS) aims to identify alleles that are significantly associated with target traits (Sallam and Martsch 2015; Mourad et al. 2018a, b, 2022; Alqudah et al. 2020; Abou-Zeid and Mourad 2021). Then, candidate genes can be detected. Identifying candidate genes controlling Pb and Sn stress tolerance will help to truly select the most promising tolerant wheat genotypes to be used for crossing by identifying the number of candidate genes within each genotype. Crossing among genotypes that are highly divergent and have as many candidate genes for target traits will accelerate the breeding program and ensure the genetic improvement of Pb and Sn tolerance by producing wheat cultivars having most candidate genes from the cross and hence showing more tolerance to Pb and Sn. It was reported that phenotypic selection supported by genetic diversity analyses and the identification of target genes in the selected genotypes are very important for the fruitful production of superior wheat cultivars with desirable traits (Hussain et al. 2017; Abou-Zeid and Mourad 2021; Ahmed et al. 2021; Eltaher et al. 2021a; Hasseb et al. 2022).

In the current study, we conducted GWAS for kernel traits scored under control, Pb stress, and Sn stress that were reported by Mourad et al. (2021b) to (1) identify candidate genes associated with Pb and Sn tolerance in a set of 103 spring wheat genotypes, and (2) genetically select the most tolerant genotypes having as many as candidate genes for crossing in a future breeding program. The best tolerant genotypes that are selected phenotypically in Mourad et al. (2021b) and genetically in this manuscript could be considered as a promising parents that could be used in future breeding programs to improve Pb/Sn tolerance under the Egyptian conditions. To the best of our knowledge, there are very few studies that identified QTL or candidate genes associated with Pb tolerance and no study on QTL associated with Sn tolerance in wheat.

Materials and methods

Plant materials, experimental design, and phenotyping of kernel traits

The studied plant materials, experimental design, phenotyping, and statistical analysis were explained in details in our previous manuscript (Mourad et al. 2021b). In brief, a set of 103-highly diverse spring wheat genotypes collected from 14 different countries worldwide were evaluated for their kernel traits under three different soil conditions, controlled, lead contaminated soils and tin contaminated soils. The complete list of the tested genotypes is presented in Table S1. The evaluation was done for one season (2019/2020) by planting the studied materials in pots at the field station of Agronomy Department, Faculty of Agriculture, Assuit University. The experimental design was replicated complete block design (RCBD) with three replications/condition. At the end of the growing season, heads were threshed and kernel traits such as kernel length (KL), kernel diameter (KD), and kernel width (KW) were scored on five randomly selected seeds from each genotype/replication. In addition, thousand kernel weight (TKW) was measured for each genotype/replication. The best ten genotypes based on each studied kernel trait were selected using the iPASTIC toolkit based on the average of sum ranks (ASR). The selection was done phenotypically based on the response of each genotype to Pb and Sn contamination in comparison with the controlled conditions.

Genotyping of the tested plant materials

The 103 tested genotypes were genotyped using two different types of molecular markers, 25K Infinium iSelect array (will be identified hereinafter as 25K-SNPs) and Genotyping-by-sequencing (will be identified hereinafter as GBS). GBS set was the same set used in our previous studies that consisted of 36,720 SNP markers for all the studied genotypes (Mourad et al. 2020; Abou-Zeid and Mourad 2021). The 25K-SNPs was generated by SGS Institute Fresenius GmbH TraitGenetics Section (Gatersleben, Germany) as described in Aleksandrov et al. (2021).The two marker sets were reported to cover different parts of the genome (Esamil et al. 2022).

Genome-wide association study (GWAS) for kernel traits under each condition

In the presented study, GWAS for the four studied kernel traits (KL, KD, KW, and TKW) was performed using rMVP R package (Yin et al. 2021) following three different models, Mixed Linear Model (MLM), generalized liner model (GLM), and fixed and random model circulating probability unification (FarmCPU). In each studied model, kinship (Kin), principal coordinate analysis (PCA), PCA + Kin were included separately to identify the best model fits the studied trait based on the distribution of the expected and observed p-values in the QQ-plot. Furthermore, stress tolerance index (STI) for each studied trait under Pb and Sn was included in the GWAS using the same models to investigate the whole genetic control of such important kernel traits. The significant markers associated with the studied traits were identified using a p-value < 0.001 (-log10 > 3.00). The phenotypic variation explained by each marker was detected using TASSEL 5.2.19 software (Bradbury et al. 2007). The targeted marker allele was identified as the one that increase KL, KD, KW, TKW as well as STI for each of these studied traits.

Gene models controlling kernel traits, their functional annotation, network in relation to kernel size, and expression

To further investigate the genetic control of the kernel traits under the studied conditions, gene models harboring the identified significant markers ware investigated by checking the base pair position of the markers and the presence of gene models in the same position using the EnsemblePlants database (https://plants.ensembl.org/Triticum_aestivum/Info/Index.). The functional annotation of the identified gene models was detected using International Wheat Genome Sequencing Consortium (IWGSC) V.1.0. Furthermore, the genetic base of these gene models in relation to kernel development was investigated and presented as a network using KnetMiner database (https://knetminer.com/Triticum_aestivum/). The gene expression under controlled and abiotic stress conditions was detected using Wheat Expression Browser (http://www.wheat-expression.com/).

Gene enrichment and biological process pathway of gene models

To provide more understanding on the genetic control of kernel traits under normal conditions as well as under Pb and Sn heavy metals, gene enrichment based on the biological process of the identified gene models was tested using ShinyGo 0.76 database (http://bioinformatics.sdstate.edu/go/). A cutoff value of false discovery rate (FDR) p-value < 0.01 was applied. Moreover, the network of the biological process was detected and their genetic control was visualized using NetworkD3 V.4.0, R package (Allaire et al. 2017).

Results

The genetic variation among genotypes, analysis of variance, correlation analyses, and the effect of Pb and Sn on each kernel trait were extensively presented in Mourad et al. (2021b). To sum up, highly significant differences were found between the controlled and Pb/Sn conditions for all the studied kernel traits. Furthermore, highly significant differences were found among the tested genotypes for all the studied kernel traits under Pb/Sn conditions. High degrees of broad-sense heritability were found for all the studied traits with values ranged from 0.88 to 0.98 under Pb conditions and 0.89 to 0.99 under Sn conditions. A summary of the analysis of variance, broad-sense heritability and standard deviation of each studied kernel traits under each condition is presented in Table S2.

Association mapping of kernel traits

The tested population showed highly significant differences among the three conditions (Ctrl, Pb, and Sn). Therefore, GWAS was conducted for each condition separately. The presence of population structure in the recent tested population was reported previously and three subpopulations were detected (Mourad et al. 2020, 2022; Abou-Zeid and Mourad 2021). Therefore, PCA and Kin were included in the tested models to correct the effect of population structure.

A summary of the significant markers identified for each studied trait under each condition is presented in Table 1. Furthermore, a detailed information on the significant markers associated with these traits is presented in Tables S3, S4, and S5. Notably, all the 21-wheat chromosomes were found to carry significant markers associated with kernel traits, except chromosome 4D that carried significant markers associated with kernel traits under heavy metals conditions only. The highest number of significant markers associated with kernel traits under controlled conditions was found on chromosomes 2A, 2B, and 5A with a number of 13 markers on each chromosome (Fig. 1a). Under heavy metals conditions, the highest number of significant markers was 25 markers on chromosomes 1A and 2B and 18 markers on chromosome 6A under Pb and Sn stress, respectively.

Table 1 Summary of significant markers associated with kernel traits (kernel diameter (KD), kernel length (KL), kernel width (KW), and thousand kernel weight (TKW)) under controlled conditions (Ctrl), lead contaminated soils (Pb), tin contaminated soils (Sn), and Stress tolerance index (STI) for each trait under Pb (STI_Pb) and Sn (STI_Sn) conditions
Fig. 1
figure 1

(a) Number of significant markers associated with the different studied kernel traits identified on each chromosome of the 21 wheat chromosomes (Un: Unknown chromosome). Venn diagrams represent the number of significant markers associated with each of the studied kernel trait under (b) controlled, (c) lead, and (d) tin conditions

Genome wide-association study for kernel traits under controlled conditions

Under controlled conditions, the QQ-plot results represented that the best models for the 25K-SNPs marker data set was FarmCPU + Kin for all the studied traits except KL that was best fitting into FarmCPU + PCA model (Fig. S1a). While the best model for the GBS data was FarmCPU + Kin for all the studied traits except TKW that was best fit into FarmCPU + PCA + Kin (Fig. S1b). Based on the 25K-SNP results, 19, 28, 19, and eleven markers were associated with KD, KL, KW, and TKW, respectively (Table S3). While eleven, 13, seven, and 23 markers associated with KD, KL, KW, and TKW, respectively based on the GBS. Combining the results of both marker sets together, 31, 41, 26, and 34 markers were associated with KD, KL, KW, and TKW, respectively (Table 1). The highest and lowest phenotypic variation explained by these significant markers were found in KD trait with a value ranged from 1.04–52.76% (Table 1). Number of minor markers (with R2 ≥ 10%) ranged from 11 markers for TKW to 19 markers for KL. On the other hand, number of major markers (with R2 < 10%) ranged from nine for KW to 22 for KL. Interestingly, no common markers between the four studied kernel traits was found (Fig. 1b).

Genome wide-association study for kernel traits under lead (Pb) contaminated conditions

Using 25K-SNPs, the best models were MLM + PCA + Kin for both KL and KW, FarmCPU + PCA for KD, and FarmCPU + PCA + Kin for TKW (Fig. S2a). While the best models for GBS markers were FarmCPU + Kin for both KD and KL, FarmCPU + PCA + Kin for KW, and FarmCPU + PCA for TKW (Fig. S2b). These models identified 33, 15, 36, and 42 significant markers associated with KD, KL, KW, and TKW, respectively (Tables 1 and S4). The phenotypic variation explained by these markers (R2) ranged from 1.22% for KL to 43.46% for KW. The lowest number of significant markers was associated with KL (15 marker) that could be classified into seven minor and eight major markers. On the other hand, the highest number of significant markers was detected for TKW (42 markers) that could be classified into 20 minor and 22 major markers (Table 1). Notably, no common markers were found to be associated with more than one kernel trait (Fig. 1c).

For further investigation of kernel traits’ genetic control under Pb conditions, GWAS was conducted using STI values. Using 25K-SNP markers, the best GWAS models were FarmCPU + PCA + Kin for STI for all the studied traits except KL that was fitted into FarmCPU + Kin model (Fig. S3a). The best models fitted with the studied traits using GBS were FarmCPU + Kin for KL and KD, and FarmCPU + Kin + PCA for KW and TKW (Fig. S3b). These models identified a total of 43, 23, 57, and 30 significant markers associated with KD, KL, KW, and TKW, respectively with a total of 150 significant markers (Table 1). Out of these significant markers, only three markers were found to be common between STI for KW and TKW (Fig. S4a). Combining the markers significantly associated with the studied kernel traits under Pb (121 markers) and the significant marker associated with STI (150 markers), 52 markers were significantly associted with both kernel traits and STI for these traits (Fig. S4b).

Genome wide-association study for kernel traits under tin (Sn) contaminated conditions

For Sn treatment, the best GWAS model using 25K-SNP were MLM + PCA + Kin for KD and TKW, FarmCPU + PCA for KL, and MLM + PCA for KW (Fig. S5a). Furthermore, using GBS, the best GWAS models were FarmCPU + Kin for KD and KL, and FarmCPU + PCA + Kin for KW and TKW (Fig. S5b). Based on these models a number of 31, 27, 27, and 30 significant markers were identified to be significantly associated with KD, KL, KW, and TKW, respectively (Tables 1 and S5). The lowest phenotypic variation explained by these markers was 1.15% for TKW while the highest phenotypic variation was found for KL with a percentage of 40.41%. These significant markers could be classified into minor markers with a number of 13, 14, 15, and 22 for TKW, KL, KW, and KD, respectively, and major markers with a number of nine, 12, 13, and 17 markers for KD, KW, KL, and TKW, respectively. Out of the identified significant markers, one marker (S2B_35274349) was significantly associated with the KD and KW, one marker (S6B_196678015) was significantly associated with KW and TKW, and two markers (AX-158550223 and BobWhite_c7157_633) were common between KL and TKW (Fig. 1d).

To provide more information on the genetic control of kernel traits under Sn, GWAS was conducted for STI. Using 25K-SNPs, the best GWAS models were MLM + PCA + Kin for KD and KW, and MLM + Kin for KL and TKW (Fig. S6a). While using GBS the best model was FarmCPU + PCA for all the traits except KW that fitted well under the FarmCPU + PCA + Kin model (Fig. S6b). Based on these models, 32, 21, 21, and 31 markers were significantly associated with KD, KL, KW, and TKW, respectively (Table 1). These markers were found to explain a high range of the phenotypic variation for all the traits and classified into minor markers with a number of three, six, 15, and 23 markers for Kl, KW, KD, and TKW, respectively, and major markers with a number of nine, 15, 16, and 18 markers for KD, KW, TKW, and KL, respectively. Out of these identified markers only two markers (BS00101408_51 and Excalibur_c74925_338) were associated with KD and KW, four markers (S3B_30092319, S3B_30092357, S3B_30092366, and S6B_196678015) were associated with KD and TKW, and two markers (wsnp_BE605194B_Ta_2_7 and S6D_436716768) associated with KD, KW, and TKW (Fig. S7a). In total, a set of 111 markers and 95 markers were identified to be significantly associated with STI under Sn, and kernel traits under Sn, respectively with a number of 39 common markers (Fig. S7b).

Genetic control of kernel traits under the different conditions

Gene models harboring the significant markers under each treatment, their functional annotation, and gene network

To further understand the genetic control of kernel traits under normal, Pb and Sn conditions, gene models harboring the identified significant markers were investigated. Under controlled conditions, a total number of 64 gene models were detected with a number of 11, 13, 14, 23 gene models harboring significant markers for TKW, KD, KW, and KL, respectively (Table 1). The functional annotation of these gene models was investigated and presented in Table S3. Furthermore, gene network of these gene models in relation to grain development was investigated (Fig. S8). Out of these gene models some were directly associated with kernel traits such as TraesCS2A02G080700 (associted with grain yield), TraesCS2D02G408400 (associated with TKW, grain number, grain size, grain density, and grain weight), TraesCS5A02G432900 (associated with grain size), and TraesCS7A02G557400 (associated with TKW) (Fig. S8).

Under Pb contaminated soils, the identified significant markers were found to be located within 141 gene models with a number of 21, 10, 20, 24 genes for KD, KL, KW, and TKW, respectively (Table 1). The functional annotation of these gene models was investigated and presented in Table S4. Gene networks of these gene models in relation to kernel development were investigated (Fig. S9). Furthermore, 28, 16, 35, and 18 gene models were found to contain significant markers associated with STI for KD, KL, KW, and TKW, respectively (Tables 1 and S4). In total, 141 gene models were found to control kernel traits and/or STI for kernel traits under Pb contaminated conditions.

Under Sn contaminated soils, 14, 20, 16, and 15, gene models were harboring significant markers associated with KD, KL, KW, and TKW, respectively. The functional annotation and gene network of these genes were investigated and presented in Table S5 and Fig. S10. Furthermore, 16, 12, 11, and 13 genes were harboring significant markers for STI of KD, KL, KW, and TKW under Sn, respectively. The functional annotation and network of these genes were investigated and presented in Table S5 and Fig. S10, respectively. In total, 83 gene models were found to control kernel traits and/or STI for kernel traits under Sn contaminated conditions.

Out of the 64, 141, and 83 gene models controlling kernel traits under controlled, Pb, and Sn conditions, respectively, only one gene model, TraesCS2A02G080700, was found to be common under the three conditions (Fig. 2a). TraesCS2A02G080700 was significantly associated with KL under controlled conditions as well as STI for KL under both Pb and Sn. The target allele of the significant marker located within this gene model “A” was found to explain 19.6%, 32.88% and 26.52% of the phenotypic variation under Ctrl, Pb and Sn, respectively and increase the length of the kernel with an average of 0.04 mm, 0.10 mm and 0.11 mm under Ctrl, Pb and Sn, respectively (Fig. 2b, Tables S3, S4 and S5). This gene model was functionally annotated to produce kinase interacting (KIP1-like) family protein. The expression of this gene model was investigated under controlled and abiotic stress conditions and higher expression was found under abiotic stress in the roots and shoots. However, lower expression of this gene was found in the spike under abiotic stress compared with the controlled conditions. Based on the gene network, this gene played a role during the grain filling duration and mature plant embryo stage. The expression map of this gene model represented the expression level of this gene in the different plant parts (Fig. S11). Higher expression was found in the plant roots/shoots in the seedling growth stage, flag leaf, spikes, spikelet, awns, lemma, glumes, and ovary under abiotic stress conditions.

Fig. 2
figure 2

a Networks represents the number of gene models associated with kernel traits under lead (Pb), tin (Sn), and controlled conditions (Ctrl). b chromosomal location, marker allele effect, gene expression, and gene network of the common gene model between Pb and Sn

Furthermore, one gene model TraesCS3A02G516400, was common between Ctrl and Pb conditions and was found to be significantly associated with KL under both conditions. It was functionally annotated to control ABC transporter ATP-binding protein. The target allele of the 25K-SNP located within this gene model was found to explain a percentage of 15.19% and 11.99% of the phenotypic variation under Ctrl and Pb, respectively and increase the kernel length with 0.03 mm under both conditions. The network of this gene represented its association with grain size and seed maturation (Fig. S12). Furthermore, higher expression of this gene model was found in the roots and leaves under abiotic stress conditions compared with controlled conditions. However, in the spike, higher expression was found under the controlled conditions compared with abiotic stress conditions. The expression map of this gene model represented its higher expression in the different plant growth stages including grains under abiotic stress conditions.

In addition, five gene models (TraesCS4A02G334100, TraesCS5A02G424400, TraesCS5B02G323600, TraesCS6B02G178000, and TraesCSU02G048500) were commonly associated with kernel traits under Ctrl and Sn conditions. These gene models were functionally annotated to control Glycine rich protein 3, Peroxisomal (S)-2-hydroxy-acid oxidase, Root cap/late embryogenesis-like protein, Fatty acid hydroxylase superfamily, and Cytochrome, respectively (Tables S3, S4 and S5). Based on the gene network, gene models TraesCS5A02G424400 and TraesCSU02G048500 are associated with seed maturation and grain density, respectively (Fig. S13). The remaining gene models were associated with root traits. Comparing the expression of these gene models in wheat spikes under controlled and heavy metals conditions, lower expression was found under abiotic stress conditions compared with the controlled conditions for all five genes except TraesCS5B02G323600 and TraesCS6B02G178000 that had higher expression under abiotic stress than controlled conditions. Out of these gene models, TraesCS6B02G178000 had the highest expression in grains based on the expression maps.

Gene enrichment analysis of the identified gene models and the biological processes pathways

To provide more understanding of the genetic control of the studied kernel traits under the different conditions, gene enrichment of the identified gene models was investigated. Due to the large number of the detected gene models, a cutoff with a value of 1% FDR was applied to detect the most important gene models. Under controlled conditions, a total of 84 biological process pathways were identified (Table S6). However, based on the 1% FDR significance level, nine biological pathways were detected (Table 2). These nine pathways were controlled mainly by three important gene models and working in three networks (Figs. 3 and S14a). The first network mainly controlling the response of the plant to other organisms and is controlled by TraesCS5A02G424400 gene model. While network 2 is controlled by TraesCS6B02G178000 gene model and controlling the production of wax metabolic process, Alkane biosynthetic, and fatty acid biosynthetic pathways. Network 3 mainly controlling the exportation of ribosomal small subunit from the cell and controlled by TraesCS2D02G427900 gene model.

Table 2 Identified biological process pathways for the gene models controlling the different kernel traits under controlled conditions, enrichment FDR value, number of gene models included in each pathway, fold enrichment, and Go ID
Fig. 3
figure 3

Network of the highly significant biological process pathways controlling kernel traits under controlled conditions and the genetic control of each network

Under Pb conditions, a total of 87 biological process pathways was identified (Table S6). This number of pathways was reduced to 18 highly significant pathways (FDR < 1%) (Table 3). The identified 18 pathways could be grouped into three different networks (Fig. S14b). The genetic control of network 1 was dependent mainly on ten different gene models. While Network 2 and Network 3 were controlled by four and three gene models, respectively (Fig. 4). Under Sn conditions, a total of 104 pathways were detected with 18 highly significant pathways (Table S6 and Table 4). These 18 pathways were found to be genetically controlled by 12 gene models and classified into six different groups controlled by a number of six, one, one, two, one, and one gene/s for Network1, 2, 3, 4, 5, and 6, respectively (Figs. 5 and S14c).

Table 3 Identified biological process pathways for the gene models controlling the different kernel traits under lead conditions (Pb), enrichment FDR value, number of gene models included in each pathway, fold enrichment, and Go ID
Fig. 4
figure 4

Network of the highly significant biological process pathways controlling kernel traits under lead (Pb) conditions and the genetic control of each network

Table 4 Identified biological process pathways for the gene models controlling the different kernel traits under tin conditions (Sn), enrichment FDR value, number of gene models included in each pathway, fold enrichment, and Go ID
Fig. 5
figure 5

Network of the highly significant biological process pathways controlling kernel traits under tin (Sn) conditions and the genetic control of each network

Genetic control of kernel traits in the selected superior genotypes under each condition

In our previous study (Mourad et al. 2021b), we phenotypically selected the superior genotypes based on the rank of the selection indices of each studied trait under both Pb and Sn conditions as well as the percentage of reduction in each kernel trait (Table S7). However, it was worth to investigate and confirm the spuriousness of the selected genotypes based on the presence of the identified significant markers associated with kernel traits. Under Pb, 20 genotypes were selected. Number of significant markers ranged from 37 to 158 markers in Svevo and Benisweif_5, respectively (Fig. 6a). Furthermore, number of the identified biological networks in these genotypes was investigated based on the presence/absence of the gene model(s) controlling the network and presented in Fig. S15a and Table S7. Benisweif 7 that was selected as a superior genotype based on KL, KW, and TKW was found to carry 122 significant markers.

Fig. 6
figure 6

Number of target allele in each of the selected superior genotypes under lead Pb (a) and tin (Sn) (b) stress conditions

Under Sn, 20 genotypes were phenotypically selected to be tolerant to Sn stress condition. The number of significant markers in each genotype ranged from 22 to 98 markers in LMPG-6 and PI_469085, respectively (Fig. 6b). The presence of the identified biological networks was investigated in these selected genotypes and presented in Fig. S15b and Table S9. The Egyptian genotype Gimmiza11 that was selected to be superior genotype based on the four studied traits was found to carry 52 significant out of the 167-identified significant markers.

Discussion

Improving wheat kernel yield is the goal of wheat breeders. Kernel traits are important characteristics that influence wheat yield and end-use quality. Therefore, studying the genetic control of such important traits is urgently needed to improve wheat production (Liu et al. 2018; Sallam et al. 2018; Mourad et al. 2019; Muhammad et al. 2020; Al-Ashkar et al. 2020). Furthermore, heavy metals contamination became a serious threatened that affect human health and crop growth all over the world. Few studies were conducted to understand the genetic control of wheat tolerance to heavy metals (Safdar et al. 2020; Zhou and Li 2022). This study, for the first time, investigated the genetic role of kernel traits under controlled and heavy metals conditions. Understanding the genetic control of kernel traits under Pb and Sn contamination will accelerate wheat breeding future and enable wheat breeders to produce high yielding genotypes that are tolerant to such important heavy metals. This will be done by phenotypically and genotypically select the best tolerant genotypes and use them in future breeding programs for Pb/Sn tolerance. This will help wheat breeder to produce genotypes that could thwart expected soil changes due to the continuous contaminations.

Association mapping for kernel traits under the controlled and heavy metals conditions

Accuracy of evaluated population and the marker data sets

The recent studied population is a highly diverse one that collected from 14 different countries around the world. It was reported as a perfect material in identifying marker allele associated with different biotic and abiotic stresses (Joshi and Nayak 2010; Abou-Zeid and Mourad 2021; Ahmed et al. 2021; Mourad et al. 2021a, 2022; Hasseb et al. 2022). Furthermore, high genetic variation with high degree of heritability was found among the tested genotypes in kernel traits under all the studied conditions (Mourad et al. 2021b). Moreover, the number of studied genotypes in this population (103 genotypes) is suitable for GWAS study that requires 100–500 genotypes (Kumar et al. 2011; Sallam and Martsch 2015; Alqudah et al. 2020). Therefore, this population is suitable for identifying the genetic control of kernel traits using GWAS.

Due to the continuous advances in sequencing methods, different marker data sets have been appeared. In the recent study, we used two different marker sets that are completely different in their technique. However, both sets covered the whole wheat genome and provide high number of markers. GBS was used previously in many of our studies and was found to be effective (Mourad et al. 2020; Abou-Zeid and Mourad 2021). Furthermore, 2 25K-SNP array was reported as an effective marker set and used widely in wheat association studied (Aleksandrov et al. 2021). The two marker data sets covered different parts from the wheat genome. Therefore, combining both marker sets will provide more understanding of the genetic control of the studied kernel traits.

Genetic control of kernel traits under controlled and heavy metals conditions

In the recent study, GWAS was conducted using three different models (GLM, MLM, and FarmCPU) including Kin and/or PCA to correct the population structure effect. In most cases, FarmCPU, either with PCA/Kin/PCA + Kin was the best model, except for very few cases where MLM model was the best one (Figs. S1, S2, S3, S5, and S6). The accuracy of FarmCPU model was reported previously due to its avoidance of negative false associations occurred by MLM (Liu et al. 2016; Kaler et al. 2020; Muhammad et al. 2021). However, due to the presence of few cases where MLM was identified as a better model, we can conclude that it is always better to test all available GWAS models to detect the best model that fits into the data. Similar results were reported in other studies (Mourad et al. 2022).

Under controlled and heavy metals conditions, GWAS identified markers associated with kernel traits that distributed across the whole wheat genome. Therefore, we can conclude that kernel traits are controlled by a wide genetic base. Previous studies reported the presence of such a wide genetic base for kernel traits using different wheat populations as well as different types of molecular markers (Ramya et al. 2010; Okamoto et al. 2013; Cui et al. 2014; Wang et al. 2022). Despite the highly significant correlation that was found among TKW and the remaining kernel traits under the different conditions (Mourad et al. 2021b), no or very few markers were found to be significantly associated with TKW and other kernel traits (Fig. 1b-d). Therefore, we can conclude that KL, KW, and KD are good contributors to the kernel yield of wheat however, each characteristic is controlled by a different genetic system. Therefore, understanding the genetic role of each kernel characteristic under controlled and heavy metals conditions is important to obtain genotypes with high yield and high quality under the different conditions.

To further investigate the genetic control of kernel traits under controlled and heavy metals conditions, gene models harboring the significant markers were detected. Interestingly, only one gene model (TraesCS2A02G080700) was found to be associated with kernel traits under controlled, Pb, and Sn conditions suggesting the presence of different genetic systems controlling kernel development and wheat yield under each condition. This common gene model was a major gene that controlled KL under controlled condition (R2 19.6%) and STI for KL under Pb and Sn stress conditions (R2 = 32.90% and 26.50%, respectively). This gene controls kinase interacting (KIP1-like) family protein, a protein that plays an important role in the growing of the pollen tube during hybridization (Skirpan et al. 2001). The production of this protein was reported also by a different gene models that was significantly associated with spike length, grain number, and grain setting and highly expressed in the early spike developmental stage (Pang et al. 2020). This gene seems to be very important in improving KL as well as KL tolerance under Pb and Sn conditions. However, more investigations are needed to confirm our findings.

Moreover, a major common gene model that controlling KL under controlled and Pb conditions (TraesCS3A02G516400) was identified. This gene model was functionally annotated to control the production of ABC transporter ATP-binding protein, a protein that were reported to be primarily involved in cadmium tolerance in wheat as it plays an important role in cadmium detoxification as well as mycotoxin tolerance (Walter et al. 2015). Furthermore, silencing genes controlling the production of this protein in some wheat varieties were resulted in spike development defection, reducing grain filling, and reducing number of spikelets/spike (Bhati et al. 2016). The significant association with a major effect of this gene model with KL under controlled and Pb conditions represented its importance in producing high kernel yield under both conditions.

Out of the identified gene models that harboring kernel traits under the different conditions, five were common between Ctrl and Sn. These gene models could be classified into three major genes (TraesCS5A02G424400, TraesCS5B02G323600, and TraesCSU02G048500) and two minor genes (TraesCS4A02G334100 and TraesCS6B02G178000). Interestingly, TraesCS5A02G424400 gene model contained three 25K-SNP markers that majorly affect TKW under Ctrl conditions. Furthermore, it has a major effect in STI for TKW under Sn stress conditions. This gene was annotated to control peroxisomal (S)-2-hydroxy-acid oxidase, a big enzymes family that plays an important role in the fatty acids and protein catabolism as well as in plants photorespirations (Martina et al. 2020). Its role in kernel development was not clarified, however higher expression of this protein was found during the flowering stage under elevated CO2 stress conditions (Maurya et al. 2020). Furthermore, this enzyme was reported to be involved in plant primary metabolism, a metabolism that enable the plant to be ready for any biotic/abiotic stress and improves plant growth (Maurino and Engqvist 2015). Therefore, the role of this gene in improving kernel traits could be illustrated. TraesCS5B02G323600 gene was found to be a major gene controlling KW under Ctrl conditions as well as STI for KW under Sn conditions. Furthermore, it was found to be a minor gene controlling KW under Sn conditions (Fig. S13 and Tables S1 and S3). It was functionally annotated to control the production of late embryogenesis-like protein (LEP), a protein that accumulates in the late stage of embryogenesis. It was reported to play an important role in late response to abiotic stresses such as drought and salinity in legumes as well as fusarium aflatoxin resistance in maize (Chen et al. 2002; Hundertmark and Hincha 2008; Battaglia and Covarrubias 2013; Gao and Lan 2016). Moreover, LEP was found to be associated with kernel length in wheat (Daba et al. 2018). The third major gene model, TraesCSU02G048500, was significantly associated with KD under controlled conditions and KW under Sn conditions. This gene model was annotated to produce cytochrome P450 superfamily. P450 enzyme was reported to have a positive effect in enhancing wheat resistance to biotic stresses, mainly deoxynivalenol (DON) produced by fusarium as well as improving wheat yield by increasing kernel yield (Gunupuru et al. 2018; Li and Wei 2020). These three major genes controlling kernel traits under Ctrl and Sn are important in improving wheat productivity and tolerance to heavy metals such as Sn.

Furthermore, the minor gene model, TraesCS4A02G334100, was associated with KL under Ctrl and STI for KL under Sn. It was functionally annotated to produce glycine rich protein 3, a protein that was reported to control flowering, senescence and increasing protein content in barley kernels (Alptekin et al. 2021), improving rice tolerance to drought (Shim et al. 2021), and improving wheat tolerance to different abiotic stresses such as cold, salinity, and drought (Xu et al. 2014). Despite that, there was no record for expression of this gene in wheat spikes, based on our results the target allele of the significant marker located within TraesCS4A02G334100 gene was found to significantly increase kernel length under Ctrl and Sn, as well as STI for KL (Fig. S13). Therefore, we can conclude that this gene plays an important role in wheat kernel growth under different conditions. However, more studies of this gene and its related enzyme are required. The minor gene TraesCS6B02G178000 was associated with TKW under Ctrl and TKW, KW, STI for KD, and STI for TKW under Sn conditions (Fig. S13). This gene model was functionally annotated to produce fatty acid hydroxylase superfamily that was reported to play a complementary role with P450 to improve anther and pollen formation in rice (Li et al. 2010) and improves maize kernel content of provitamin-A, hence, improving kernel quality (Dutta et al. 2019). Generally, unsaturated fatty acids play an important role in plant response to stresses (He and Ding 2020). Therefore, the role of fatty acids hydroxylase superfamily in converting saturated fatty acids into unsaturated fatty acids improves the plant tolerance to stresses.

The identified seven gene models that associated with kernel traits under more than one studied condition are very important in accelerating the future of wheat breeding to expected abiotic stresses as well as improving plant yield and end-use quality in the cultivated wheat genotypes. Furthermore, due to the polytropic effect of most of these gene models against aflatoxins produced by fungal diseases such as fusarium head blight, these seven genes will improve wheat quality under different biotic and abiotic stress conditions. More studies are needed to investigate the role of these genes under different conditions. Such detected genes which has alleles that increased the susceptibility to Pb and Sn can be knocked out from the target genotypes using CRISPR cas9 technology.

Gene enrichment, biological processes, and networks of genes controlling kernel traits under the different conditions

Due to the large number of identified gene models that controlling kernel traits under controlled, Pb, and Sn stress conditions as well as the very few numbers of common gene models among the different conditions, it was concluded that kernel traits are controlling with different genetic system under each condition. Therefore, it was worth to investigate the biological processes that lead to kernel development and their genetic role under each condition. For this purpose, gene enrichment analysis was conducted for all the identified gene models and high number of biological pathways were identified (Table S6). To predict the most important biological pathways, a cutoff with p-value 1% FDR was applied. As a result, a number of nine, 18, and 18 biological pathways were identified as the most significant biological pathways controlling kernel development under Ctrl, Pb, and Sn, respectively.

The nine biological pathways identified under Ctrl conditions were found to consist of three different networks (Figs. 3 and S14a). Network 1 was mainly controlling wheat response to biotic stresses such as fungus, virus, bacteria, and other organisms. However, looking for the gene model controlling this network, as we mentioned previously, this gene model functionally prepare the plant to any stress and improve the plant development, improving the kernel size, quality, and yield (Maurino and Engqvist 2015). Therefore, network 1 and its genetic control aims to improve yield quality. Furthermore, previous studies concluded that some disease resistance proteins are helpful for kernel developments and quality (Perlikowski et al. 2014). Therefore, the presence of biological processes that controlling disease resistance work in the same way in improving kernel yields and quality. Network 2, containing mainly biological processes that aims to produce wax. Wax is an essential content of wheat kernel skin that protects the embryo from the outside environments. Therefore, improving wax content helps in improving kernel yield and quality (Chaban et al. 2021). Network 3 controlling the exportation of ribosomal subunit from nucleus, an important biological process that is essential in wheat growth, development, biotic response, and abiotic response (Kalinina et al. 2018; Ramu et al. 2020; Appels et al. 2021). Therefore, this network is important in improving kernel yield and quality.

The 18 pathways identified under Pb conditions were divided into three networks (Fig. 4). Network 1 contained 14 biological pathways and controlled by ten gene models (Figs. 4 and S14b). All the biological pathways in this network aims to regulate the physiological membrane processes in the plant cell by regulate cell membrane exocytosis, vesicle localization, and Golgi vesicle transport (Table 3). The main aim of these biological processes is to maintain the concentration of useful metals in the target concentrations and reduce the absorption of harmful heavy metals such as cadmium, arsenic, and mercury into the cell (De Caroli et al. 2020). Therefore, the significant association of this network and its genetic control plays an important role in wheat tolerance to Pb in terms of kernel yield and quality. Network 2 controlling lipid catabolic process, an expected response to heavy metals stress that leads to lipid peroxidation and membrane damage (Rizvi et al. 2020). The role of this network in kernel development is not clear. However, it was found to be genetically controlled by four different genes (Fig. 4). The functional annotation of these gene models were Patatin-like phospholipase, Phosphoinositide phospholipase C, Fatty acid oxidation complex subunit alpha, Phospholipase D, all that correlated with kernel growth in plants (Yamaguchi et al. 2019; Gao et al. 2021; Wu et al. 2022). The last network, network 3, was associated with the superoxidative metabolic process and cysteine biosynthetic processes. The regulation of superoxide metabolic process was reported as a tolerant technique that helps the plant to tolerate the effect of harmful metals (Culotta et al. 2006). Cysteine is a component of wheat gluten. It was reported that increasing the cysteine content in the cereal kernels, increasing its quality (Whitcomb et al. 2020). Therefore, this network is very important in improving wheat tolerance to Pb as well as producing kernel yield with high end-use quality.

Under Sn conditions, six networks were identified. Network 1 was found to control the metabolism and biosynthetic of organic amino acids such as L-proline, arginine, carboxylic acid, oxoacid, and dicarboxylic acid (Figs. 5 and S14c). Amino acids are main components of wheat grains that consisting in building proteins, hence improving kernel end-use quality traits (Hayat et al. 2013). Furthermore, some amino acids such as L-proline have been reported as an oxidative defense molecule and metal chelator that provides osmo-protection in wheat cells growing under different abiotic stresses and improve its tolerance (Liang et al. 2013; Iqbal et al. 2019). Accumulation of L-proline was reported as a natural response that improves plant tolerance to cadmium (Hayat et al. 2013). Moreover, applying L-proline to wheat seeds as a pre-sowing coating, enhanced wheat growth and yield (Kamran et al. 2009). Carboxylic acid was also reported to improve wheat resistance to abiotic stresses and play a role similar to fungicides (Dorostkar et al. 2022). Therefore, we can conclude that network 1 is very helpful in improving wheat tolerance to Sn stress as well as improves kernel yields and end-use quality. Notably, network 2 and network 3 are common between Sn and controlled conditions due to their common genetic control. Therefore, there role in kernel development will not be discussed here. However, it worth to mention that the primary response provided by network 3 and wax formation provided by network 2 are two important processes in improving wheat tolerance to abiotic stresses including heavy metals which confirms our findings (Wang et al. 2015; Li et al. 2019). Network 4 contained three gene models that were associated with tRNA metabolic process and tRNA-guanine transglycosylation enzyme. The metabolic of tRNA requires many enzymes to be done such as TRNA-guanine transglycosylation and Amino-acyl tRNA biosynthesis. Amino-acyl tRNA biosynthesis has been identified as a key enzyme that involved in grain filling process in wheat (Rangan et al. 2017). Hence, plays an important role in kernel traits as well as end-use quality traits. Unfortunately, there is no enough information about the role that tRNA-guanine transglycosylation enzyme plays during grain development. Network 5 aims to regulate G2/M transition of miotic cells. G2/M transition is a checkpoint for cells to inter the miosis phase. Higher expression of genes controlling G2/M transition was reported in the embryo of Arabidopsis plant (Lin et al. 2007). Therefore, this network seems to be important for kernel developments in wheat. Furthermore, G2/M transition has an important role in plant tolerance in heavy metals tolerance (Francis 2011; Pedroza-Garcia et al. 2022). Network 6 that control Manganese (Mn+2) transportation is very important in improving end-use quality. Mn+2 is an essential element for plant growth. Deficient in Mn+2 in plant results in growth and yield decreasing, susceptibility to pathogens, and susceptibility to abiotic stresses (Socha and Guerinot 2014; He et al. 2021). It was reported that Mn+2 is important in heavy metals detoxification process (Pittman 2005; Qiao et al. 2019). Therefore, all the identified networks seem to be highly important to improve wheat tolerance to Sn, producing high kernel wheat, and high end-use quality.

Investigate the genetic control of kernel traits in the selected genotypes

In our previous study (Mourad et al. 2021b), twenty genotypes were selected under each heavy metal condition based on the rank of all selection indices and the reduction of kernel traits (Table S7). To confirm the superior response of the selected genotypes, the number of significant markers in each selected genotype was investigated. Out of the twenty genotypes selected under Pb conditions, two genotypes (Debra-1 and Svevo) have a little number of the significant markers (38 and 37, respectively). However, these two genotypes had only GBS marker data and were not sequenced using 25K-SNP. Based on the number of significant markers, six genotypes were found to contain more markers than Beniweif_7 that was selected as a superior genotype with a small reduction in KL, KW, and TKW. For example, Benisweif_5 that contained the highest number of target alleles (158 markers) had a higher reduction in KW and TKW compared with Benisweif_7 (Table S7). Out of the identified 167 significant markers associated with the kernel traits under Sn conditions, few markers were found in the selected 20 genotypes. The highest number of markers was 98 in the Greek genotype (PI_469085). This genotype was selected only based on KW. The Egyptian genotype, Gimmiza_11, that was selected as a superior genotype based on the four kernel traits contained only 52 significant markers. Some of superior genotypes contained very low number of significant markers such as LMPG-6, Line E, and Morocco. These genotypes were genotypes were sequenced using only GBS data set. Therefore, we can conclude that using different sequencing methods in the GWAS study enable the breeder to understand more about the genetic control of the studied traits. The presence of lower number of significant markers in the best genotypes concluded that other genes controlling kernel traits may be exist in these genotypes that we could not identify using the recent marker sets. Moreover, selection of superior genotypes should be done by combining both phenotypic and genotypic data. The results of GWAS should be exploited in the selection of the genotypes having target traits by identifying the number of target alleles in each genotype (Tekauz 2003; Sallam et al. 2016; Abou-Zeid and Mourad 2021; Eltaher et al. 2021b; Ghazy et al. 2021; Abo-elyousr et al. 2022).

Conclusion

In conclusion, kernel traits are complicated traits that controlled by a wide genetic base. This base is completely different under controlled and heavy metals conditions, as well as under different heavy metals. The recent study, for the first time unlocked the genetic role of kernel traits and identified some major genes under each condition. The significant markers detected in this study should be validated in a different genetic background to be used for marker-assisted selection. The most important SNP found in this study was Ku_c269_2643 which located within TraesCS2A02G080700 and seems to be very important in improving wheat yield and quality under a wide range of abiotic stresses. The selected superior genotypes (Gimmiza_11 and Benisweif_7 for Sn and Pb, respectively) seem to contain other unknown genes controlling kernel traits. However, the selected genotypes are a good source of Pb and Sn tolerance under the Egyptian conditions.