Background

Wheat yield has more than doubled over the last century because of genetic improvement mainly due to the effect produced by the breeding activity on plant height and floret fertility [1, 2]. Strong selective pressure was exerted on the harvest index (expressed as ratio between the grain weight and total plant biomass), for which values of 40–50% were recorded [2]. On the contrary, the total biomass of wheat genotypes, including tetraploid wheat, remained almost unchanged, indicating that the increase in yield was associated exclusively with a different relocation of photosynthates [3].

More recently, crop residues (i.e., wheat straw) are playing an increasingly important role as a source of renewable energy since modern technological processes aim at the use of inexpensive raw materials (second-generation biofuels) and are more suitable to produce bioethanol and biogas [4, 5]. Unfortunately, the complex structure of these lignocellulosic materials makes the polymers they contain (cellulose, hemicellulose, and lignin) very resistant to fungal enzymatic and chemical degradation with a low conversion efficiency, significantly reducing their use to produce biofuels [6]. Therefore, it might be appropriate to alter the chemical properties of wheat straw preserving the levels of defense against pathogens which also depends largely on the cell wall components [7, 8].

To ensure the supply of this type of biomass to the bioenergy industry, an integrated systems biology approach is needed to define the plant ideotype capable of optimizing the production of biofuels without compromising the food production of the crop. It needs to adopt a multidisciplinary approach combining plant physiology, biochemistry, molecular genetics, and genomics technologies to improve the total biomass production and optimize the composition of crop residues to the needs of the bioenergy industries [9]. Until now, the preferred strategy for modifying cell wall components in many species of bioenergetic interest was through the isolation of mutants [10] defective in the synthesis of some enzymes involved in the lignin biosynthetic pathway (brown- midrib). The use of these mutants, however, is hampered by low biomass production and low yields [11, 12].

Although numerous genomic tools are widely available for different cereal species, the complexity of the lignin and cellulose synthesis system makes the biotechnological approaches for altering the synthesis of these sugars very difficult in both bread and durum wheat. To overcome this issue, the exploration and exploitation of the genetic variability within wheat species appeared much more suitable to map the genetic determinants of the traits of interest. Genome-wide association study (GWAS) has been a routine and powerful approach for high-resolution genetic mapping of complex traits in plants [13]. Economically important traits include agronomic and yield associated traits (reviewed by [14] as well as biotic [15,16,17] and abiotic stress tolerance [18,19,20,21] have been mapped using GWAS in wheat. Conventionally, GWAS was performed using a single-locus mixed linear model (SL-MLM) [22]. In the last few years, multi-locus mixed linear models (ML-GWAS) have been developed, as they have higher power to detect significant marker-trait associations for complex traits than conventional SL-MLM methods [23, 24], 2018, [25,26,27,28].

ML-GWAS involve a multi-dimensional genome scan in which the effects of all markers are simultaneously estimated and does not require a multiple test correction, a statistic test that can be too conservative, especially when analyzing complex traits regulated by many genes with small effects. For these reasons, here we selected six ML-GWAS models that involve two-steps. During the first step, a single-locus GWAS method is applied to scan the entire genome, and putative QTNs are detected according to a less stringent critical value, such as P < 0.005 or P < 1/m, where m is the number of markers. During the second step, all selected putative QTNs are examined by a multi-locus GWAS model to detect true QTNs. Since markers effects are simultaneously tested in ML-GWAS models, they can represent appropriate genetic models for molecular dissection of complex traits such as those involved in biomass compositions. On this basis, the present study aimed to: i) evaluate the wheat straw composition and morphological traits through ML-GWAS using a collection of tetraploid wheat species; ii) identify novel genomic regions associated with these traits and suggest candidate genes, and iii) validate SNPs markers for marker-assisted selection.

Results

Phenotypic variation and correlations analysis

The data set of three-year field trials was examined with an ANOVA to reveal: (i) genotype effect, (ii) year effect, (iii) interaction between genotype and year, and (iv) residuals (Fig. 1). Using BLUP values, the percentage of variation attributed to the genotype ranged between 11.2% and 90.1% for Biomass and PH, respectively. The year effect was generally low, except for Biomass and GW, for which it accounted for 72.1% and 67.8%, respectively. The genotype × year interaction was higher for ADF and SCSb, where it was higher than 60%. By contrast, it was lower for the remaining traits, reaching minimal values of 10.1% for Biomass. The same dataset was also used to calculate hereditability (H2). The H2 values ranged from a minimum of 0.05 for Biomass and FTN to a maximum of 0.93 for PH (Fig. 1). H2 values higher than 0.60 were observed for traits such as SCSm, HI, PH, and SPL, whereas values lower than 0.30 were identified for TTN, FTN, Biomass, ADF, NDF, CEL, HEM and GW. Best linear unbiased predictors (BLUPs) were then calculated for all traits and used for PCA, Pearson correlation analysis, and ML-GWAS. BLUPs distribution is reported in Supplementary Fig. 1 and in Supplementary Table 1. Differences based on Triticum subspecies have been observed within BLUPs distribution (Supplementary Fig. 2). For example, durum wheat accessions showed the lowest and highest BLUPs values for PH and HI, respectively. Significant differences were also observed for ADL and SPL (Supplementary Fig. 2) for durum wheat and other tetraploid subspecies. The PCA graph confirmed the variability of accessions according to the subspecies, highlighting a differentiation of the durum accessions from the rest (Fig. 2). The first five principal components (PCs) explained up to 73% of the total variance. Among them, the first two (PC1 and PC2) accounted the 37.4% of the total variation, with 21.5% and 15.9% for PC1 and PC2, respectively. Biomass and related traits such as FTN, TTN, SPL, PH, and Biomass were mainly influenced by PC1 (Fig. 2), whereas PC2 was mostly attributed to SCSa, SCSb, SCSm, NDF, ADF, and CEL. PC1 also discriminated durum wheat genotypes from other accessions of tetraploid wheat. Pearson correlation was employed to deeper understand the pairwise relationships among the traits under investigation (Fig. 3). Sixty-two significant correlation trait-pairs (P ≤ 0.05) were identified among all the traits. Out of all, 34 were positively correlated, while other 28 were correlated negatively. A highly positive correlation was found between FTN and TTN and, CEL and ADF (r = 0.96 and 0.83), whereas PH and HI together with SPL and HI showed the highest negative correlation (r =  − 0.67 and − 0.44). In addition, SCSa, SCSm and SCSb were all positively correlated between each other (r > 0.40), whereas they were negatively correlated with TTN, ADL, and PH (Fig. 3). ADL, Biomass, SPL, and PH were also negatively correlated with CEL and HI.

Fig. 1
figure 1

Variation component analysis with phenotypic traits measured in 185 wheat genotypes. The plot shows the percentage of variation explained by each component. The components of phenotypic variation are: i) Genotype, ii) Year, iii) interaction (Genotype x Year) and iv) residuals as percentage of the observed variation. The value on top of bars represents the broad-sense hereditability for each trait

Fig. 2
figure 2

Whole phenotypic variability of the 185 wheat genotypes. Loading plot of the first (PC1) and second (PC2) principal components showing the variation for 15 traits. Based on Triticum ssp., genotypes are represented by different colored symbols indicated in the legend. Trait contributions are show with arrows. The direction and distance from the center of the biplot indicate how each trait contributes to the first two components. Trait acronyms are in Supplementary Table 3

Fig. 3
figure 3

Pearson rank correlation coefficients between pairs of phenotypes. Correlation coefficients are indicated in each cell. Colored correlations are those with P value < 0.05 after Bonferroni correction. Color intensity is directly proportional to the coefficients. On the below side of the correlogram, the legend color shows the correlation coefficients and the corresponding colors. Trait acronyms are reported in Supplementary Table 3

SNP markers statistics, population structure and linkage disequilibrium

A total of 6,090 variants were removed from the raw dataset due to missing genotype data, whereas other 49,875 variants removed due to minor allele (MAF) threshold(s), yielding a total of 20,755 filtered variants. The 20,755 SNPs included 9,199 and 11,557 on the sub-genomes A and B, respectively. The number of markers on each chromosome ranged between a minimum of 954 SNPs for chromosome 4A and a maximum of 2,075 on chromosome 2B, as shown in Fig. 4. On sub-genome A, the maximum number of SNPs were on Chr. 7A (1,687), followed by 2A (1,431) and 4A (954). By contrast, the highest number of SNPs on the sub-genome B was detected for Chr. 2B (2,075), followed by 1B (1,984) and 4B (1,031). The SNPs distribution on durum wheat chromosomes is provided in Fig. 4.

Fig. 4
figure 4

SNP density plot showing the number of SNPs within 1 Mb window size along sub genome A (a) and B (b). The horizontal axis shows the chromosome length (Mb); the different color depicts SNP density

The filtered panel of 20,755 SNP markers was then used to investigate the population structure of 185 tetraploid wheat genotypes on the basis of a PCA method. The analysis indicated the existence of three distinct major clusters (Supplementary Fig. 3). The first dimension (PCA1) separated all durum wheat samples (ssp. durum) from the others, with the exception of few individuals belonging to ssp. turanicum which were included in the cluster together with durum genotypes. Durum wheat cultivars were also distributed along the PCA2 axis, separating the ancient/old varieties such as Aziziah, Timilia, Grifoni and Capeiti (quadrant IV) from the more recent ones (quadrant I), which included elite varieties mainly derived from national and international breeding programs. In addition, PCA2 further distinguished accessions belonging to ssp. turgidum, turanicum and polonicum from those belonging to ssp. carthlicum, dicoccum and dicoccoides, with some exceptions. Linkage disequilibrium (LD) decay distance at which r2 fell to 0.20 was ∼ 1.8 Mb, 1.6 Mb and 1.7 Mb in whole, A and B genome, respectively (Supplementary Fig. 4).

Genome-wide association analysis

ML association mapping analysis was conducted using only the correction for kinship as it resulted the most appropriate method for the panel under study. A total of 470 significant QTNs were found associated with 15 traits (LOD ≥ 3). Out of all, 49, 64, 23, 166, 90, 68 QTNs were detected using mrMLM, pLARmEB, FASTmrEMMA, pKWmEB, FASTmrMLM and ISIS EM-BLASSO, respectively (Supplementary Fig. 5). QQ plots related to each trait are reported in Supplementary Fig. 6. Only 72 QTNs for 15 traits (two for ADL and GW, three for ADF, HEM, FTN and TTN, four for CEL, Biomass, NDF and HI, five for SCSb, six for SCSa, eight for SCSm, ten for SPL, and eleven for PH), detected by at least two methods, were declared as reliable QTNs (Table 1) and used in downstream analysis. Among these latter, BobWhite_c44947_277, associated with SPL, was detected by all methods, whereas RFL_Contig3228_2154 and wsnp_Ex_c24135_33382521, associated with SCSb and ADL, respectively, were detected by five methods (Table 1).

Table 1 Sixty-one QTNs identified using two or more than two multi-locus GWAS models. logarithm of the odds (LOD) and phenotypic variance explained (R2%) for each QTN are also reported. Trait acronym are reported in Supplementary Table 3

QTNs for biomass and chemical composition

Twenty QTNs were identified for Biomass and five related traits (ADF, ADL, NDF, CEL, and HEM) by at least two different models and then considered as reliable (Table 1). They were distributed on ten chromosomes: 1B, 2A, 2B, 3A, 4A, 4B, 5A, 6B, 7A, 7B. Three reliable QTNs were identified for ADF. Among these, Q.Adf-5A was annotated as major QTN (R2 ≥ 10 at least in one method and LOD values ranging between 3.19–4.27), whereas the other two (Q.Adf-1B and Q.Adf-3B) as minor. Two QTNs on chromosome 2B, (one major Q.Adl-2B.2 and one minor, Q.Adl-2B.1) were instead identified for ADL. Four reliable QTNs were identified for CEL; Q.Cel-2A was considered major as it explained the highest phenotypic variance (6.71–14.74%), whereas the remaining three were declared as minor since their R2 values were < 10%. Four QTNs were also identified for Biomass and NDF. Two QTNs for Biomass (Q.Biomass-2B.1 and Q.Biomass-4B) and one for NDF (Q.Ndf-4B.1) were annotated as major, whereas the others were as minor.

QTNs for morphological traits

Fifty-two reliable QTNs were significantly associated with nine traits morphological traits, and they were distributed on all chromosomes except for chr. 1B (Table 1). The highest number of QTNs were identified for PH (11) whereas the lowest number was found for GW (2). Among the QTNs identified for PH, only Q.Ph-1A was considered major (R = 19.78%), whereas among the QTNs identified for SPL, two (Q.Spl-1A.1 and Q.Spl-3B) were considered major. In particular, Q.Spl-1A.1 explained phenotypic variation ranging between 5.72% and 11.96% and it showed the highest LOD values (7.13–12.54). Eight reliable QTNs were associated with SCSm, of which four were major (Q.Scsm-3B, Q.Scsm-4A, Q.Scsm-6B.3 and Q.Scsm-6B.4). Among these latter, Q.Scsm-4A explained the highest phenotypic variance (12.14%) with LOD values ranging between 4.11–9.09. Six and five reliable QTNs were associated with SCSa and SCSb, respectively. In particular, two QTNs for SCSa (Q.Scsa-1A.2 and Q.Scsa-3A) and two for SCSb (Q.Scsb-2B and Q.Scsb-3B) were major. Four QTNs were instead identified for HI. Out of all, Q.Hi-4A was declared as major, since it explained up to 20.94% of the phenotypic variation. Three QTNs for TTN and FTN were also identified, but they explained a phenotypic variation of < 10%. Finally, two QTNs were identified for GW, of which one was major (Q.Gw-4A).

Allelic effect of major QTNs on biomass traits

The major QTNs (R2 ≥ 10) were also tested using t-test (P ≤ 0.01) (Fig. 5). We divided the population into two groups according to allelic profile to test whether the mean BLUP values of the two groups were significantly different. In total, 16 QTNs had a significant effect on nine traits (Fig. 5). Among these, the highest number of QTNs (three) were significant for SCSm (Q.Scsm-4A, Q.Scsm-6B.3, Q.Scsm-6B.4). Two QTNs were significant for SCSa (Q.Scsa-1A.2, Q.Scsa-3A), SCSb (Q.Scsb-2B, Q.Scsb-3B), SPL (Q.Spl-1A.1, Q.Spl-3B) and Biomass (Q.Biomass-2B.1, Q.Biomass-4B). One QTN showed significant effect on ADF (Q.Adf-5A), CEL (Q.Cel-2A), HI (Q.Hi-4A), NDF (Q.Ndf-4B.1), and PH (Q.Ph-1A).

Fig. 5
figure 5

Boxplot for 15 reliable QTNs with significant effects (P < 0.01) on corresponding traits. For each QTNs, the germplasm lines were divided into two groups according to superior and inferior allele type. The X-axis represents the two alleles for each QTNs, while the Y-axis corresponds to BLUP values

Identification of putative candidate genes associated with major QTNs

Genomic regions (± 1.8 Mb) surrounding the sixteen major QTNs with allelic effects were investigated (Table 2). Several genes modulating lipid, carbohydrate, and starch-sucrose metabolisms, as well as genes involved in photosynthetic processes and secondary metabolites production, were annotated within QTNs. For example, three synthases, Cellulose synthase (CESA), Sucrose Synthase 6 (SuSy) and Glucan synthase-like 4 (GSL4), along with two transferases Beta-fructofuranosidase (CIN4) and Sterol 3-beta-glucosyltransferase, all involved in sucrose and starch metabolism, were found associated with SCSa, SPL and SCSm, respectively.

Table 2 Candidate genes around the reliable QTNs and their functional annotation. The durum (Svevo) gene ID along with direct orthologs in rice is also reported

Similarly, three peroxidases (PRX1, PRX22, and PRX36), known to be involved in the phenylpropanoid-lignin pathway were found within QTNs associated with ADF, Biomass, and SCSa. In addition to these latter, 4-coumarate-CoA ligase (4CL) and Hexosyltransferase, both belonging to the same pathway, were also found. Genes involved in hemicellulose biosynthesis were also identified among QTNs. For example, a Glucuronoxylan 4-O-methyltransferase involved in the modification of one of the principal components present in the secondary cell walls of plants (hemicellulose 4-O-methyl glucuronoxylan) was associated with NDF and SCSa. Interestingly, HYPONASTIC LEAVES 1 (HYL1), a gene encoding a nuclear double-stranded RNA-binding protein with a role in miRNA biogenesis was found associated with ADF. Transcription factors belonging to Ap2-like ethylene-responsive (AP2/ERF), Ethylene-responsive (ERF), WRKY, and MYB were also annotated. Among them, a scarecrow transcription factor like OsGRAS31, a WRKY transcription factor similar to OsWRKY72, a Zinc finger-homeodomain protein 1 similar to OsZHD4, and a MYB similar to OsMYB30 were associated with HI, Biomass, NDF, and SCSm, respectively.

Marker validation through molecular methods

Based on the allelic effects of the 16 reliable QTNs, one marker (Q.Scsb-3B, RFL_Contig3228_2154) associated with the understudied trait SCSb, was selected and validated on 34 accession included in the panel under study, using two different molecular methods (HRM and rhAMP). The marker RFL_Contig3228_2154 was able to distinguish genotypes with a strong contrasting phenotype based on their allele (Fig. 6, Supplementary Table 2), since the homozygous “AA” and “aa” profiles were associated with low and high values of SCSb, respectively. Both HRM and rhAmp analysis showed that most accessions had the allelic profile “AA”, whereas six showed homozygous “aa” genotype.

Fig. 6
figure 6

Validation of RFL_Contig3228_2154 on thirty-four genotypes using two approaches. a) Melting temperatures (Tm) from HRM analysis and b) allelic discriminations plot from rhAmp assay. Each dot represents a genotype, while the allele state (homozygous for the reference allele, homozygous for the alternate allele) are labeled with different colors. Variant 1 mean homozygous allele 1 / allele 1, whereas variant 2 refer to homozygous allele 2 / allele 2

Discussion

Wheat straw is an attractive substrate for second-generation biofuel production because it will complement and augment wheat production rather than competition with food production. Whilst many wheat varieties were developed to optimize yield and grain quality for human and animal consumption, little emphasis was given to developing the non-food components for biorefining purposes. Probably also because standard chemical analysis of large numbers of different samples was expensive and time consuming to be used in breeding programs.

A large variability was detected in the panel selected in this study, confirmed what was previously observed in the large tetraploid wheat germplasm by Laidò et al. [29] and Taranto et al. [30]. The great genetic diversity reflected the evolutionary history of tetraploid wheats. Indeed, the wild and domesticated accessions were separated from the durum wheat cultivars. These latter were spread on the PCA axes mainly based on the year of release [29]. Indeed, the ancient/old varieties clustered closed to ssp. turgidum, turanicum and polonicum, while the modern cultivars were spread separately from the other samples. This large genetic variability of elite cultivars may be explained by the fact that they derived from national and international breeding programs developed during the last thirty years [13].

On the contrary, the variation in the chemical composition of the biomass and related traits did not always reflect the large genetic diversity, confirming the previous observations conducted by Joshi et al. [31] and Blümmel et al. [32] on wheat straw in South Asia. Indeed, a moderate variation was observed comparing the phenotypic distribution among different wheat subspecies. Nevertheless, robust QTNs and genotypes carrying superior straw traits were identified, probably due to the sensitivity of the ML-GWAS approach. ML-GWAS models, the FASTmrMLM method was relatively faster compared to other models, as also reported by Chaurasia et al. [25]. The ISIS EMBLASSO detected the highest number of significant associations, whereas the lowest number was found with mrMLM.

Plant height plays a crucial role in biomass accumulation and grain yield and all six-multi-locus models identified QTNs associated with this well-studied trait on chromosomes 1A, 2A, 3A, 4B, 5B, 7A, and 7B, consistent with QTLs reported in previous studies [33,34,35,36]. Two of them were localized ~ 9 kb far from the main Rht-B1 gene controlling PH trait on chromosome 4B [37], in the same genomic region where a QTL for the same trait was identified by Vitale et al. [38].

Unlike other cereal species such as maize, rice, and barley, in which numerous studies were conducted with the aim of mapping QTLs related to the straw composition [39,40,41], in wheat the studies were scarce, often referred to a limited number of genotypes. Malik et al. [42], searching for significant SNP markers associated to quality parameters of wheat straw, identified marker-trait associations (MTA) on chromosomes 1A, 1B, 4A, 4B, and 6A for glucose, xylose and arabinose, all traits crucial for increasing sugar release for bioethanol production.

(Supplementary Table 4). We found a marker on chromosome 4A (Excalibur_c24511_1196) associated with CEL (Q.Cel-4A), located 22 Mb far from the MTA for Arabinose (GENE-1756_115) identified by Malik et al. [42].

(Supplementary Table 4). By contrast, although we found QTNs for ADF, NDF, and Biomass on the same chromosomes detected by Malik et al. [42], our markers were located at least 100 Mb far, making it difficult to validate with our findings. Comparing the results with the QTLs known in the literature with the regions identified in our study, reliable QTNs on chromosomes 2A, 2B, 4B, 4A, and 5A (Q.Biomass-2B.2, Q.Biomass-4B, Q.Hi-4A, Q.Adf-5A, Q.Cel-2A, and Q.Ndf-4B.1) were coincident with previously reported QTLs for biomass accumulation [43], HI [44], grain yield [36, 45] and heading date [45].

(Supplementary Table 4). Additional QTNs, associated with biomass composition on chromosomes 3B (Q.Adf-3B), 2B (Q.Adl-2B.2), 4A (Q.Cel-4A), 4B (Q.Ndf-4B.2), and 5A (Q.Hem-5A), were coincident with QTLs previously identified for grain yield [45], phenolic acid content [46], biomass [36, 47], shoot dry weight [48] and heading date [47, 49], respectively. Recently, Joshi et al. [31] carried out a GWAS on 287 spring wheat lines for mapping straw fodder quality trait and identified associations for ADF, ADL, and NDF on chromosomes 1A, 2B, 3A, 5A and 5B. In our work, we found a QTN on chromosome 2B (Q.Adl-2B.1) which is located 21 Mb far from the MTA (Excalibur_c49875_479 on chromosome 2B) described by the authors for the same trait (ADL) (Supplementary Table 4). By contrast, no chromosomal region overlapping with our results have been found for the remaining traits. The fact that in these regions were mapped loci associated with several biomass-related traits makes them an interesting source of allelic variation to modulate their phenotypic expression. Three associations identified in this study for the SPL on chromosomes 1A (Q.Spl-1A.1, Q.Spl-1A.2) and 3B (Q.Spl-3B) agreed with the QTLs previously identified by Graziani et al. [50] and Maccaferri et al. [48] for the number of spikes per square meter and total root number, respectively.

Among the agronomic traits analyzed, stem solidness was also considered. Usually, the morphological features of solid stemmed wheat suggested that it could be highly resistant to lodging [51]. In addition, it was known that solid stemmed wheat varieties have increased resistance to damage from sawfly larvae, as the presence of solid pith impedes larval growth and migration [52]. In fact, wheat stem sawfly (WSS) resistant varieties with pith-filled solid stems have been selected in North America and in central Europe to help control WSS since the 1950s. There were several studies conducted to identify the genetic basis of stem solidness whereas more limited were the studies exploring the differences in the biochemical compositions between hollow- and solid-stemmed varieties. Recently, Nilsen et al. [53] demonstrated that copy number variation of TdDof, a gene encoding a putative DNA binding with one finger protein, affected the stem solidness trait in wheat at the SSt1 locus on chromosome 3BL. More recent genetic studies have identified a second allele at the Qss.msub-3BL locus contributing to stem solidness in durum wheat. This allele was first identified in the cultivar Conan and was designated Qss.msub-3BL.c [1, 54]. The Qss.msub-3BL.c conferred a solid-stem phenotype at the early stage of stem elongation, differently from the phenotype conferred by the Rescue-derived Qss.msub-3BL.b allele and, was lost later in stem elongation and maturation. Given that in this study, the scoring for stem solidness was carried out at harvest time, it could be the reason why no associations with the SSt1 locus on chromosome 3B, were found. In addition, minor QTLs were also identified on chromosomes 2A, 2D, 4A, and 5A that were found to synergistically enhance expression of SSt1 to increase stem-solidness [55]. These previous results supported the SNPs associations found in the present study for stem solidness (SCS) at three levels of the culm (basal, medium and apical) on chromosomes 1A, 2B, 3A, 3B, 4A, and 6B. Unfortunately, the three regions mapped on 3B (Q.Scsb-3B, Q.Scsa-3B and Q.Scsm-3B) did not coincide with the region of the SSt1 locus [55] whereas, they coincided with QTLs previously mapped for disease resistance traits as yellow rust resistance [56] and fusarium head blight resistance [57]. Similarly, the other QTNs identified for the SCS traits on the other chromosomes overlapped with QTLs previously mapped for resistance diseases such as Q.Scsm-4A, Q.Scsb-2B and Q.Scsm-6B.3 for leaf, yellow and stem rust [58], Liu et al. 2017b; [59] suggesting their potential involvement in other genetic resistance mechanisms in addition to the well-known resistance to WSS. Most durum wheat accessions do not possess the solid-stem Qss.msub-3BL.b allele for stem solidness and have been traditionally classified as hollow-stemmed. However, hollow-stem durum wheat typically has more resistance to WSS than hollow-stem hexaploid wheat [60]. Therefore, despite several studies aiming to map the loci responsible for the solid stem phenotype, the underlying molecular mechanisms contributing to this key trait remain elusive. The validated SNP marker (RFL_Contig3228_2154) associated to SCSb in the present work was previously related to different trait such as grain weight and gluten component (HMW-GS, and LMW-GS) [13, 61]. Now, this marker can be used for MAS to track differences in SCSb in tetraploid wheat accessions.

Candidate genes surveying revealed genes involved in lipid metabolism, cell wall modifications and cell cycle

In our work, we found different classes of candidate genes in QTNs/genomic regions. For example, genes involved in the synthesis of principle components present in the secondary walls of eudicotyledons (i.e., cellulose, lignin, and 4-O-methyl glucuronoxylan) were discovered within QTNs related to the chemical composition of the biomass. These polymers are the most abundant constituent material of the plant cell walls, thus constituting the major components of plant biomass. They interact with themselves and with each other via covalent and noncovalent bonds to form a macromolecular network that determines the biological and physical properties of the secondary wall. Here, we detected a Cellulose synthase (CESA), a Glucuronoxylan 4-O-methyltransferase, and three different peroxidases associated with SCSa, ADF and NDF, respectively. In Arabidopsis, CESA1, CESA3 and CESA6 (or CESA6-like) are required for primary wall cellulose synthesis. Chu et al. [62] observed that the knockout of AtCESA2 caused severe defects in cell wall formation that led to abnormal plant growth and development. By contrast, the transgenic lines overexpressing CESA2 showed enhanced growth performance with increased biomass production. Similarly, PmCESA2 in poplar led to an altered cell wall polysaccharide composition, which resulted in the thickening of the secondary cell wall and xylem width [63]. Consequently, the cellulose and lignin content were increased. Consistent with these studies, CESA could be used as a potential candidate gene to enhance cellulose synthesis and biomass accumulation in wheat. Coincident with the role of CESA, genes encoding secondary cell wall xylan and its modifications (i.e., GXMT) are also important for biomass production [64, 65]. Since genetic approaches have provided limited insight into the mechanisms of 4-O-methyl glucuronoxylan synthesis, our candidate gene annotated as Glucuronoxylan 4-O-methyltransferase may represent a new target to selectively manipulate polysaccharide O-methylation, providing new opportunities to modulate biopolymer interactions in the wheat cell wall. It is noteworthy that the presence of lignin in cell walls is also important since it imparts recalcitrance in the deconstruction of the wall materials for pulping and biofuel production [66, 67]. To reduce cell wall recalcitrance, a great deal of interest has been invested in engineer lignin and its composition (Van Acker et al. 2014, [20, 21, 68]. In model plants, down-regulation or silencing of genes (PRX2, PRX3, PRX22, PRX60, PRX71, and PRX72) encoding peroxidases resulted in reduced lignin accumulation and altered lignin composition [69, 70]. Consistent with these studies, in our work, we found three different peroxidases (PRX1, PRX22, and PRX36) that might be important for lignin production and/or its degradation.

Despite the well-known genes reported above, other candidate genes with a role in cell architecture, plant growth regulators, photosynthetic pathways, and microRNA biogenies were also found. For example, an Anaphase promoting complex (APC/C) was significantly associated with CEL trait (Q.Cel-2A). It has been shown that when the Arabidopsis APC3a/CDC27a gene is overexpressed in tobacco, it accelerated plant growth, leading to plants with increased biomass [71]. Similar results were also obtained when tobacco plants overexpressing the APC10 gene from Arabidopsis increased biomass and reduced life cycle length [72]. Another interesting candidate is HYPONASTIC LEAVES 1 (HYL1). This gene encodes a nuclear double-stranded RNA-binding protein which is involved in microRNA (miRNA) biogenesis, and in the regulation of miR156 [73, 74]. The overexpression of miR156 in Arabidopsis caused increased total leaf numbers, and biomass [75]. Similarly, alfalfa plants overexpressing miR156 had reduced internode length and stem thickness and elevated biomass production [76]. In red clover, overexpression of miR156 increased the number of shoots, delayed flowering, and accelerated biomass accumulation [77].

In addition, we also found a Transaldolase (TAL) within the region flanking the QTN Q.Spl-3B. Chen et al. [78] in Pichia stipites identified a TAL gene as a rate-limiting enzyme for xylose-to-ethanol bioconversion. Indeed, despite the increase in the understanding of the molecular mechanisms involved in biomass production and composition, it is also important to consider the conversion of biomass products to biofuel. Using overexpressed lines Chen et al. [78] reported an increase in ethanol production by 36% and 100%, suggesting that improving the Transaldolase activity in P. stipitis can significantly increase the rate and yield of xylose conversion to ethanol. Thus, the identified superior alleles with significant effect in the present study (i.e., those for ADF, NDF, CEL, and SCSa) may have critical role for improving biomass composition in wheat varieties with positive effects on bioethanol production.

Conclusions

Our study will provide new insights to the genetic basis of biomass composition traits in tetraploid wheat. The application of six ML-GWAS models on a panel of diverse wheat genotypes provided an efficient approach to dissect complex traits with low heritability such as wheat straw composition. A total of 72 reliable QTNs were detected by two or more than two models. Among the major QTNs identified in this study, 16 QTNs showed a significant effect on the corresponding phenotypes. Further, putative candidate genes were identified from the associated genomic region. In addition, a marker associated with SCSb has been validated through molecular screening (HRM and rhAmp), providing a reliable marker for MAS applications. The discovery of genes/genomic regions associated with biomass production and straw quality parameters is expected to accelerate the development of high-producing wheat varieties useful for biofuel production. The information generated in this study would be also useful as a basis for further functional investigation especially in the genomic region close the validated marker and define a new wheat ideotype.

Methods

Pant materials and field experiments

The tetraploid wheat (Triticum turgidum L., 2n = 4x = 28; AABB genome) collection used in this study was comprised of 185 accessions available in the germplasm bank at CREA Research Centre for Cereal and Industrial Crops in Foggia. The panel, including wild, domesticated and cultivated accessions of seven subspecies (dicoccoides, dicoccum, carthlicum, polonicum, turanicum, turgidum, and durum), was chosen to represent a wide phenotypic variability for the main morphological traits that were evaluated in this study.

The wheat collection was grown in southern Italy at the experimental farm of CREA Research Centre for Cereal and Industrial Crops at Foggia (41°27′36″ N, 15°30′05″ E) for three growing-seasons (2009, 2010, 2012) on a clay-loam soil (Typic Chromoxerert), with the following main chemical characteristics: organic matter (Walkley–Black method) 2.5 and 2.6%; available phosphorus (Olsen method) 62.0 and 68.0 mg kg−1; exchangeable potassium (ammonium acetate method) 422 and 450 mg kg−1; total nitrogen (Dumas method) 1.3 and 1.1%. The genotypes were sown on recommended dates and arranged in randomized complete blocks with 2 replications. Plots comprised eight rows of 7.5 m in length with a distance between rows of 0.17 m. The sowing density was always 350 seeds m2. The field experiments were supplied with 45 kg/ha N and 115 kg/ha P2O5 as pre-sowing and 85 kg/ha N as top dressing each year. Weeds, pests, and fungal diseases were chemically controlled.

Morphological traits

Plant height (PH) (in centimeters) was measured during the milk-waxy maturation when the maximum height level was achieved, from ground to the tip of the ear (excluding awns) on five main culms per plot. To evaluate stem solidness, more than 5 stems were randomly selected at post-anthesis and were cross-sectionally cut at the center of each internode in the basal (SCSb), median (SCSm), and apical (SCSa) part of each stem. The level of stem solidity was rated as 1–5 (1 for hollow and 5 for solid) using the UPOV scoring system [79]. At physiological maturity, above-ground dry matter was determined by cutting plants at the soil surface from a 1 m2 area (6 rows × 0.95 m). The plants collected were oven-dried at 70 °C for 48 h and weighed for total dry matter. Then, the spikes were cut, measured in length (SPL, cm), and threshed, and the grain was weighed (GW). Straw dry weight (Biomass) was calculated as the difference between above-ground biomass and grain weight. Harvest index (HI) was calculated as was calculated as the ratio of grain weight to total biomass. Trait acronyms are reported in Supplementary Table 3.

Cell-wall chemical analysis

Cell-wall carbohydrates were quantified by determination of acid detergent fiber (ADF), acid detergent lignin (ADL), and neutral detergent fiber (NDF) according to the methods of Van Soest et al. [80] using an ANKOM 220 Fibre Analyzer (ANKOM Technology Corporation, NY, USA). Hemicellulose was calculated as NDF – ADF and cellulose as ADF – ADL [81]. Trait acronyms are reported Supplementary Table 3.

DNA material and Plant genotyping

Genetic variation data, generated using the Illumina  wheat 90 K iSelect Assay developed by TraitGenetics [82], were extracted from a bigger population deposited at Mendeley Data website (https://data.mendeley.com) with the following DOI number: 10.17632/rt2gmzbvmz.1. The whole dataset can be downloaded using the link (https://data.mendeley.com/datasets/rt2gmzbvmz/1). The raw dataset related to the 185 genotypes under study was processed with plink [83] using a call rate value lower than 95% and a minimum allele frequency (MAF) lower than 5%. After filtering, a total number of 20,755 SNPs was used for the downstream analysis. The resulting VCF file related to only 185 individuals under study is available at the Figshare data repository (https://figshare.com) under the following DOI number: 10.6084/m9.figshare.18586076. Data can be downloaded using the following link: https://doi.org/10.6084/m9.figshare.18586076.v1.

Principal component analysis (PCA) was calculated usng the resulting SNPs & Variation suite (SVS) v.8.4.0 (Golden Helix inc) and drawn in R [84].

Multi-locus genome-wide association analysis

Association analysis was performed using multi-locus random-SNP-effect MLM, (mrMLM) [24], fast mrMLM (FASTmrMLM) [85], iterative modified-sure independence screening expectation–maximization-Bayesian least absolute shrinkage and selection operator (ISIS EM-BLASSO) [23], integration of Kruskal–Wallis test with empirical Bayes (pKWmEB) [86], fast multi-locus random-SNP-effect efficient mixed model analysis (FASTmrEMMA) [87], and polygenic-background-control-based least angle regression plus empirical Bayes (pLARmEB) [27]. All ML-GWA models were tested by using mrMLM v4.0 [28], downloaded from http://cran.r-project.org/web/packages/mrMLM/index.html. Kinship matrix was calculated by the specific option implemented in the mrMLM v4.0 package [28] and used in all methods as covariate. Default values were used for all parameters. In particular, the REML option for the Likelihood Function was used for FASTmrEMMA model, whereas the bootstrapping was chosen for pLARmEB model. The association analysis was conducted using two approaches: i) Kinship matrix, ii) K + PCA as Q matrix. As proposed by Zhang et al. [88] for multi-locus GWA analysis, we used a LOD = 3.0 (or P = 0.0002) as a cut-off to balance the high power and low false positive rate for QTN detection. In addition, SNP markers detected by two or more different models were designated as reliable QTNs, as suggested by Chaurasia et al. [25]. QTNs with r2 values > 10% were declared as major, as also showed by Chaurasia et al. [25].

Principal component (PCA), analysis of variance (ANOVA), Broad-sense heritability (H2) and Pearson correlation

A two-way analysis of variance (ANOVA) was implemented to investigate the genotype and year effects, their interaction (genotype x year) and residuals. Broad-sense heritability (H2) was estimated as follows:

$$\mathrm H^2=\sigma \mathrm{g}/\;\left[\sigma \mathrm{g}+\left(\sigma\mathrm{gy}/\mathrm y\right)+\left(\sigma e/\tau\mathrm{y}\right)\right]$$
(1)

where σg is the genotypic variance, σgy the variance explained by the interaction between genotypes and year, σe the variance of residuals, τ the number of the replicates and y the number of the year. Best linear unbiased prediction (BLUP) of phenotypic traits collected over years were calculated using the following mixed linear model:

$${\mathrm y}_{\mathrm i}\mathrm j\:=\:\mathrm\mu+{\mathrm g}_{\mathrm i}\:+{\mathrm t}_{\mathrm j}+\;\left[\mathrm{gt}\right]\;_{\mathrm i}\mathrm j+\mathrm e$$

where y_ij are the observed traits, μ is the overall mean, g_i is the effect of the ith line assumed as random effect, t_j is the effect of the jth trial (year) modelled as random effect, [gt]_ij are the genotype-trial interaction, and e corresponds to the residual effect considering as random and assuming to have a normal distribution r ~ N(0,[Iσ]_r^2). The model was implemented using the function lmer in the R package lme4 [89]. The normal distribution of BLUP data was verified using the Shapiro test. In addition, principal component analysis (PCA) was performed with BLUP values. ANOVA, BLUPs, PCA, and correlation analyses (Pearson’s correlation with significance level α = 0.05) were carried out using FactoMinerR [90], Lme4 R [89], factoextra [91], and corrplot [92] packages.

Candidate genes

Putative candidate genes were searched in flanking regions of the significant QTNs. The linkage disequilibrium (LD) decay value was calculated using the LD Ajacent Pairs Analysis function (SVS) and then used to define the confidence interval.

Then, gene annotation was retrieved based on the Svevo durum wheat high-confidence gene models (https://www.interomics.eu/durum-wheat-genome). Putative candidates were then used as baits for a BLASTn search against the NCBI database to assign gene names based on direct orthologs of Oryza sativa.

SNP marker assay validation

Firstly, two different molecular methods (High-Resolution Melting analysis (HRM) and rhAmp allelic discrimination assay) have been used to validate the SNP marker associated with basal stem solidness using a panel of 34 accessions with a contrasting behavior for SCSb and for allelic profiles. As far as HRM is concerning, primer3 software version 4.0.0 (Whitehead Institute for Biomedical Research, Cambridge, MA; http://primer3.ut.ee) was adopted to design primers. The HRM analyses were performed in 384 well plates on the QuantStudio 12 K Flex (Life Technologies, USA), following the procedure described by [93], whereas the rhAmp allelic discrimination assay was carried out following the procedure described by Broccanello et al. [94] and Ravi et al. [93]. Sequences of rhAmp assays are available upon request.