Background

Stalk lodging can seriously influence photosynthesis and substance transportation, and annually causes reductions in maize yield ranging from 5 to 20% worldwide [1]. Several factors, such as genetics, natural conditions, field management, diseases and insect pests, can result in weak plant standability and stalk lodging [2,3,4,5,6]. Strong stalks can reduce the occurrence of lodging because stalk mechanical strength is negatively correlated with stalk lodging [7,8,9,10,11,12]. Hence, stalk strength can be used to evaluate stalk lodging-resistance. Several approaches have been developed to measure stalk strength, including stalk bending strength, stalk crushing strength, and rind penetrometer strength (RPR) [13,14,15,16,17,18]. Compared to other methods, RPR has obvious advantages in terms of its simple and efficient operation and no requirement for destroying the structure of the stalk and impacting plant growth [1, 14, 15, 19,20,21]. Consequently, investigating RPR in multiple growth stages can be viewed as an available and feasible strategy to determine the optimal measurement period. In addition, RPR has been significantly and positively correlated with stalk lodging-resistance in previous studies [9, 14, 15, 22]. Moreover, purposeful selection for RPR has been performed and has been shown to be able to simultaneously improve stalk quality and lodging-resistance [8, 15, 23,24,25].

As for the genetic architecture of RPR, previous studies have used association and linkage mapping to identify quantitative trait loci (QTL) with the purpose of developing functional markers to carry out marker-assisted selection (MAS) to improve stalk strength. Specifically, multiple F2:3 populations genotyped by simple sequence repeats (SSR) markers have been used to dissect the genetic basis of RPR, and then to compare the efficiency of phenotypic selection and MAS. A total of 35 QTL have been detected that corresponded with RPR, which clearly implies the complex nature of stalk strength [1, 21]. Moreover, MAS for high RPR has been shown to be more effective than phenotypic selection when the QTL are derived from the same population rather than from separate populations [23]. In addition, several recombinant inbred line (RIL) and double haploid (DH) populations have been constructed to explore RPR-related loci, and high-quality linkage maps have been established using single nucleotide polymorphism (SNP) markers. The potential candidate genes predicted from these studies are directly or indirectly associated with the biosynthesis of lignin and cellulose, illustrating that cell wall components are likely involved in the determination and formation of RPR [16, 19, 20, 26, 27]. On the other hand, a range of significantly associated loci related to RPR have been detected in genome-wide association study (GWAS) for maize nested association mapping panel and natural population, further indicating the great genetic complexity of RPR [28, 29]. Despite that, genomic selection (GS) for stalk strength has rarely been reported, and unsatisfactory prediction accuracies have been obtained in previous studies [16, 28]. The predictive ability needs to be improved to be able to predict stalk strength in a GS strategy for practical breeding programs.

In this study, we focus on dissecting the genetic architecture of RPR and providing several useful pieces of advice for plant breeders to improve the selection efficiency for stalk lodging-resistance. The datasets in the present study that contained phenotypic data that are evaluated in seven stages in two RIL populations and genotypic data obtained from SNP array were used to identify RPR-relevant loci and perform genomic selection. Our objectives were to (1) ascertain the optimal measure stage for stalk strength; (2) dissect the genetic architecture of RPR; (3) predict potential candidate genes and biological pathway associated to RPR; and (4) perform genomic selection by using multiple models to enhance breeding efficiency and seeking a few advisable measures for practical breeding schemes.

Results

Phenotypic variation and complex relationship between stages in RPR

According to previous studies and research results from our group [6, 10, 11, 30], the third stalk internode above the ground was selected to investigate RPR to assess stalk strength (Fig. 1a). The RPR values evaluated in multiple stages, environments and RIL populations exhibited normal distributions, and the difference between the smallest and largest values ranged from 1.90- to 3.33-fold for each stage (Additional file 1: Figure S1). Additionally, the phenotypic correlation coefficients between V10 and the other stages were lower than those between DTS and the stages after silking and much lower than rp between stages after silking (Additional file 1: Figure S1). A phenotypic clustering analysis was performed following the phenotypic variation of RPR in two RIL populations (Fig. 1b). The RPR values in the seven stages were classified into three groups, and the largest cluster consisted of AS10, AS20, AS30, AS40 and AS50, which belong to the reproductive growth phase. Stages V10 and DTS were separately classified into other two groups (Fig. 1b). On the other hand, the boxplot of RPR of each RIL population was drawn using data from various stages across environments. More precisely, RPR increased from stage V10 to stage AS50 in each environment, and then a slight enhancement was obtained ranging from stage DTS to subsequent stages (Fig. 1c). In addition, RPR in stage V10 in 2012H was remarkably higher than that in stage V10 in other environments. Moreover, RPR evaluated from stage DTS to stage AS50 within 2012B was larger than that in other environments in both RIL populations (Fig. 1c). Analysis of variance was performed to estimate the broad-sense heritability. The H2 of RPR in stage V10 in each population was lowest relative to the other stages. The broad-sense heritability varied from stage to stage and was larger than 67.0% in all stages except stage V10 (Additional file 1: Table S1).

Fig. 1
figure 1

Extensive phenotypic variation of rind penetrometer resistance in each RIL population. a Diagram of the RPR measurement. The cross-sectional area of the probe is 1 mm2. b Hierarchical clustering of RPR evaluated in seven stages in each RIL population. Height is defined as the Euclidean distance between clusters. c Phenotypic variation of RPR measured in seven stages across multiple environments. HO: the high-oil population (B73 × BY804); LR: the lodging-resistance population (Zheng58 × HD568); RPR: rind penetrometer resistance; V10: the tenth-leaf stage; DTS: days to silking; AS10: 10 days after silking; AS20: 20 days after silking; AF30: 30 days after silking; AS40: 40 days after silking; AS50: 50 days after silking; 2012B: Beijing in 2012; 2012H: Hainan in 2012; 2013B: Beijing in 2013

Construction and quality of high-density linkage map

A total of 15,167 SNPs were retained after quality control for genotypic data, and then a total of 11,691 SNPs passed the chi-square test with a significance level over 0.05 in the LR population (Fig. 2). A sliding-window approach was applied to assign bin markers, and a set of 2121 recombination bins were used to construct the genetic map. The average physical length of the bin markers was 971 Kb, with a minimal length of 5.1 Kb and a maximal length of 54.4 Mb (Additional file 1: Table S2). There were 19 bin markers with a length of more than 10 Mb, and a total of 15 bins were located at centromeric or pericentromeric regions distributed in ten chromosomes (Additional file 1: Table S3). The other four recombination bins with a long physical length, including lmk0203, lmk0979, lmk1002, and lmk1453, were in regions with a lower SNP coverage because most of the SNPs in these regions were disqualified by chi-square tests (Fig. 2, Additional file 1: Table S3). Regarding the analysis of genotypic data in the HO population, a total of 756 SNPs were retained without segregation distortion (chi-square test, P > 0.05), and then these markers were used to construct genetic map and perform further analyses. The linkage maps for the HO and LR populations were constructed by the R package. The map lengths were 1642.2 and 1519.5 cM for each RIL population, respectively. Moreover, the average genetic lengths between adjacent markers were 2.2 and 0.7 cM for the HO and LR populations, respectively, which were equivalent to approximately 2.6 and 0.97 Mb in physical length (Additional file 1: Table S4). To evaluate the quality of the linkage maps, plots were drawn to compare the order of markers, which illustrated an excellent collinearity between physical and genetic maps (Fig. 2, Additional file 1: Figure S2). In addition, QTL mapping of cob color in the HO population and silk color in the LR population were performed to assess the power and accuracy of the genetic maps of each population. The QTL for cob and silk color, called pC1 and pS10 here, respectively, were detected with the highest LOD values of 55.7 and 7.2 for the peaks located at 47.8 and 138.1 Mb on chromosome 1 and 10, respectively (Fig. 3). For pC1, a cloned gene, P1 (for pericarp color1), which conditions red flavonoid pigment and phlobaphene in the floral organs, including the kernel pericarp, cob glumes, tassel glumes, and silk [31, 32], lies in this QTL. A classical gene, R1, located in the QTL pS10 that regulates anthocyanin pigmentation in tissues is corelated with the color of the kernel pericarp and silk [33, 34].

Fig. 2
figure 2

Distribution of and analysis process for genotypic data in the LR (Zheng58 × HD568) population. The outmost layer with the scale represents ten chromosomes in maize. a Distributed proportion of each polymorphic SNP within the RIL population. The red color represents the ratio of individuals with genotypes derived from Zheng58; green color represents the ratio of individuals with genotypes derived from HD568; gray color is the reference line for 0.5. b Distribution of chi-square values for each SNP within ten chromosomes. The green color denotes unqualified markers with P values lower than 0.05. c Density of qualified SNPs based on the chi-square test (1.0 Mb window size). The scale with different colors is an indicator of the number of markers within the unit window size, and the numbers from inner to outer are 0 to 50 with spacing of 10. d Physical distribution of bin markers on each chromosome. e Distribution of bin markers in the linkage map. Gray lines denote the comparison of the order of bin markers shared between physical and genetic maps

Fig. 3
figure 3

Mapping of P1 and R1 in each RIL population. a QTL mapping of cob color in the HO population (B73 × BY804). The red dot denotes the relative physical position of the P1 gene. b QTL mapping of silk color in the LR population (Zheng58 × HD568). The red dot denotes the relative physical position of the R1 gene

QTL mapping of RPR in each RIL population

QTL mapping of RPR evaluated in seven stages in each environment was performed using a high-quality genetic map in the R package R/qtl version 1.44–9 [35]. A total of 66 QTL were detected for RPR in the HO population, which included 26, 20, and 20 QTL identified in the three environments. The physical lengths of the confidence intervals for these QTL spanned from 0.83 to 29.11 Mb, with an average length of 6.54 Mb. The phenotypic variance of RPR explained by each QTL ranged from 1.95 to 13.61%, with a mean value of 7.01% of the variation in RPR. The genetic effects for each QTL were estimated ranging from − 0.24 to 0.29 (Additional file 1: Table S5). Furthermore, the alleles derived from BY804 decreased the stalk strength in RPR when the value of the genetic effect was negative. On the other hand, alleles from B73 improved RPR when the genetic effect of the QTL was positive. A total of nine QTL could individually explain more than 10% of the phenotypic variance in RPR, and four of these QTL were detected in stage AS30 (Additional file 1: Table S5). In addition, a total of three QTL were identified in 2013B with overlapped confidence intervals and had higher genetic effects, indicating this QTL could improve RPR from 0.25 to 0.29 kg/mm2 (Additional file 1: Table S5). As for the LR population, a total of 45 QTL were identified for RPR in the two environments, which contained 23 and 22 QTL. The confidence intervals for each QTL spanned physical lengths from 0.70 to 64.49 Mb, with an average length of 8.37 Mb; 38 of these were shorter than 14.0 Mb in physical length (Additional file 1: Table S6). The phenotypic variance explained by each QTL for RPR ranged from 1.85 to 14.06%, with a mean value of 6.31% of the variation in RPR. Moreover, the genetic effects of each QTL were calculated ranging from − 0.19 to 0.26 (Additional file 1: Table S6). Besides, if the value of the genetic effect was positive, the alleles could derive from HD568 and enhance stalk strength in RPR. However, alleles could exert a negative effect on RPR when the genetic effect of QTL was negative (Additional file 1: Table S6). There were three QTL related to RPR that could singly explain more than 10% of the phenotypic variance, and 2 of these QTL were identified in stage AS10 (Additional file 1: Table S6). The relationship between the QTL number and broad-sense heritability was analyzed using BLUE values to perform QTL mapping in each population. The number of QTL increased as the broad-sense heritability estimated for RPR in each stage increased across environments (Fig. 4a). On the other hand, as the genetic correlation coefficient between stages increased, the number of overlapped QTL for both stages increased (Fig. 4b). Finally, a total of 18 pleiotropic QTL (pQTL) were detected by integrating the overlapped genomic region among 111 QTL for RPR in two populations, which were located on chromosomes 1 to 9 (Table 1). In particular, pQTL6–2, whose confidence interval encompassed 16 QTL that were identified in the HO population across three environments, was located on chromosome 6, with a physical length of 8.56 Mb. The phenotypic variance explained by this pQTL in different situations ranged from 3.57 to 13.31% of variation in RPR, with a mean of 7.81%. Furthermore, the genetic effects were estimated ranging from − 0.09 to − 0.24, with an average effect of − 0.19 (Table 1, Fig. 5, Additional file 1: Table S5). In addition, pQTL8 was identified and had a physical length of 22.42 Mb that included 9 QTL located on chromosome 8, of which eight QTL were derived from the LR population and another QTL was detected with a lower genetic effect in the HO population (Table 1, Additional file 1: Figure S3, Additional file 1: Table S5, S6).

Fig. 4
figure 4

Relationship between QTL mapping and genetic parameters. a Relationship between QTL number and broad-sense heritability. b Relationship between overlapped QTL number and genetic correlation between all pairs of stages. HO: the high-oil population (B73 × BY804); LR: the lodging-resistance population (Zheng58 × HD568)

Table 1 Pleiotropic QTL (pQTL) for rind penetrometer resistance in two RIL populations
Fig. 5
figure 5

Illustration of pQTL6–2 identified in various situations. Violin plots denote the difference between genotypes derived from each parent; HO: the high-oil population (B73 × BY804); RPR: rind penetrometer resistance; DTS: days to silking; AS10: 10 days after silking; AS20: 20 days after silking; AF30: 30 days after silking; AS40: 40 days after silking; AS50: 50 days after silking; 2012B: Beijing in 2012; 2012H: Hainan in 2012; 2013B: Beijing in 2013

GO enrichment and KEGG pathway analysis for candidate genes

According to the available database for maize gene annotation accessible at MaizeGDB, a total of 106 predicted candidate genes with physical regions corresponding to the confidence intervals of these QTL were selected and annotated. Moreover, these candidate genes were determined according to the list of classical genes described in the annotation database, and the description of biological function for the predicted genes was usually related to substance transportation and cell growth (Additional file 1: Table S7). GO analysis of the candidate genes illustrated that the enrichment items mainly included biological processes related to metabolism, biosynthesis, response to stress, and material transportation. In addition, the cell components relevant to genes consisted of the plasma membrane, Golgi apparatus, endoplasmic reticulum, and cell wall (Fig. 6a). As for the KEGG analysis of the predicted genes, a total of 12 pathways were identified (Fig. 6b). These pathways included the biosynthesis of secondary metabolites, starch and sucrose metabolism, plant hormone signal transduction, galactose metabolism, etc., which could be related to the formation of the cell wall and could contribute to the formation of RPR.

Fig. 6
figure 6

Analysis of GO enrichment and KEGG pathway based on predicted candidate genes. a Enrichment analysis of GO items. MF: molecular function related to candidate genes; CC: cellular component corresponding to candidate genes; BP: biological processes associated with candidate genes. b Analysis of KEGG pathway based on candidate genes. The color of the dot refers to the corrected P value; the size of the dot denotes the number of candidate genes in the pathway; the rich factor represents the ratio of the number of candidate genes in the pathway

Improving genomic selection for RPR using models considering fixed effects or multivariate

Prediction accuracies in the HO population using the UV model ranged from 0.06 to 0.52, and a minimal value was estimated using phenotypic data from 2012H. The rMP estimated in stage V10 in three environments was lower than in other stages (Fig. 7a, Additional file 1: Table S8). As for the performance of the UV model in the LR population, rMP changed from 0.38 to 0.58 in various stages across environments (Additional file 1: Table S8). However, improvement of rMP could be achieved relative to its estimation by the UV model when the QTL detected in each stage within different environments were considered as fixed effects in the GBLUP model. The maximum difference of rMP between the UV and FIXED models was 0.21 in stage V10 within 2012H. In general, the rMP values based on the FIXED model were higher than those calculated by the UV model. Moreover, the changes of rMP evaluated by the FIXED model ranged from 0.26 to 0.63 in the HO population and from 0.41 to 0.61 in the LR population (Fig. 7a, Additional file 1: Table S8). Compared to the FIXED model, the ME model could further enhance the prediction accuracy, for which the phenotypic data of RPR investigated in the first two environments were used to construct auxiliary variates in the multivariate model. The improvement of rMP between the FIXED and ME models ranged from 0.02 to 0.17 in each stage (Fig. 7a). Furthermore, the proportions of residual variance estimated by the FIXED and ME models were lower than those calculated by the UV model, which ranged from 0.43 to 0.71 in each stage. However, the proportions of residual variance evaluated by the FIXED model were overall higher than those in the ME model (Additional file 1: Table S9). Additionally, cross-validation was performed by the MS model using the RPR values evaluated in the first six stages as auxiliary variates to predict RPR in the seventh stage. The rMP was significantly increased in both RIL populations relative to its value estimated by the UV model (Fig. 7b). In addition, the proportions of variance components corresponding to the auxiliary variates in the MS model were very high, which can explain most of the phenotypic variance (Additional file 1: Table S10).

Fig. 7
figure 7

Comparison of the prediction accuracies between models. a Comparison of predictive ability between UV, FIXED and ME models in HO population. b Comparison of predictive ability between UV and MS models in two RIL populations. UV: univariate model, namely the general GBLUP model; FIXED: the GBLUP model considering RPR-relevant QTL as fixed effects; ME: multivariate GBLUP model using phenotypic data evaluated in other environments as auxiliary variates; MS: multivariate GBLUP model using phenotypic data evaluated in other stages as auxiliary variates; HO: the high-oil population (B73 × BY804); LR: the lodging-resistance population (Zheng58 × HD568); RPR: rind penetrometer resistance; V10: the tenth-leaf stage; DTS: days to silking; AS10: 10 days after silking; AS20: 20 days after silking; AF30: 30 days after silking; AS40: 40 days after silking; AS50: 50 days after silking; 2012B: Beijing in 2012; 2012H: Hainan in 2012; 2013B: Beijing in 2013; the 5-fold cross-validation scheme was implemented in this case

Discussion

Stalk strength is a highly important agronomic trait in maize because of its relationship with stalk lodging and grain yield. However, RPR, as a crucial measurement index, can efficiently and precisely evaluate stalk strength to improve the lodging-resistance of breeding lines. Hence, the genetic dissection of RPR can provide powerful assistance for the selection of candidate lines with high stalk strength based on functional molecular marker detected by association and linkage mapping [1, 16, 20, 28, 29]. Furthermore, the utilization of genomic selection can also accelerate the breeding process of complex traits without phenotyping in later breeding phases [36,37,38,39]. Taking full advantage of genomic information led to better genomic prediction of RPR in this study.

The relatively higher stalk strength in 2012B compared to that in other environments is likely attributed to the lower planting density in that environment, indicating that a high planting density may reduce RPR, which is consistent with a previous study [6]. According to the ANOVA results, RPR has relatively high broad-sense heritability, which is supported by several previous studies [1, 19, 20, 26, 27], illustrating that genetic effects can account for the most proportion of phenotypic variance in RPR, and that better selection of RPR can be achieved in early generations if target lines are used as parents to construct breeding populations to screen out varieties with high stalk strength. However, RPR, a complex quantitative trait, is controlled by multiple genes with minor effects, which has been discussed in previous studies [1, 19, 28, 29]. The breeding scheme of population recurrent selection may be more efficient for the pyramid of favorable alleles related to RPR [8, 15, 25]. In brief, combining early generation selection and population improvement can enhance the breeding efficiency of selecting breeding lines with high stalk strength. But a lower broad-sense heritability was estimated in stage V10 in each RIL population, which may be attributed to the fact that stage V10 is a vegetative growth stage and nutrient and dry weights greatly increase in this stage [40]. Individual plants have weak stalk strength due to the rapid growth of the internodes, which can be affected by nutrient deficiencies, heat, and drought [41]. As illustrated by the ANOVA results in the present study, nongenetic effects account for a higher proportion of phenotypic variance of RPR in stage V10. According to the results of the phenotypic clustering and correlation analyses, stage V10 was individually separated from other stages and was associated with other stages with lower correlation coefficients, which further indicated that RPR in various stages may be controlled by different genetic factors and that the last six stages might have a similar genetic basis. Nevertheless, DTS was not classified into common subgroup with other stages after silking, which was likely attributed to the fact that the latter stages belonged to the phase of kernel development undergoing grain filling to maturity. On the other hand, the difference in RPR values between stages except for stage V10 was small, as shown by the distribution boxplot of each RIL population. Moreover, the broad-sense heritability in these stages was relatively higher than it was in stage V10. Hence, RPR measured in the silking phase or stage after silking can be used to evaluate stalk strength, as shown by several previous studies that had been provided evidence directly and were performed in silking phase or a few weeks after flowering [4, 20, 23, 26, 28, 29, 42]. Finally, inbred lines with high stalk strength in this study can be selected as novel germplasms to make candidate crosses in the future.

Genetic maps of each RIL population were constructed by the R package based on the Kosambi mapping function. Classical and cloned genes, including P1 [31, 32] and R1 [33, 34], were detected in each RIL population, indicating that these constructed linkage maps had high quality and accuracy to allow subsequent analysis of QTL mapping. The broad-sense heritability of RPR varied from stage to stage and was positively correlated with the number of QTL detected in each stage. It is implied that more QTL can be identified to better dissect the genetic basis of complex traits if a high broad-sense heritability is estimated for the target traits. On the other hand, more overlapped and common QTL for RPR can be obtained between different stages when the genetic correlation coefficient of both stages is increasingly large. In general, the higher the genetic correlation between traits, the more common the QTL, which may be illustrated by the fact that these traits were controlled by alike or linked genes or had common metabolic pathways [43]. The position and number of QTL detected in each experimental population were generally different across stages and environments. It is implied that discrepant genetic mechanisms may exist for RPR, which has been investigated in various situations, and it is further indicated that gene expression may be characterized by spatiotemporal specificity and is activated at specific times during plant development. Besides, the phenotypic variance explained by each detected QTL was lower than 15%, which was consistent with the results of other studies [1, 19], indicating that RPR is controlled by multiple alleles with minor effects and that there is a lack of major QTL for this trait. However, there were 18 pleiotropic QTL with overlapped genomic regions that were identified in multiple stages. In particular, pQTL6–1 in the HO population was repeatedly detected 16 times in different stages across environments. In addition, the pleiotropic QTL, namely, pQTL8, was identified 9 times across various phases and environments, including 8 times in the LR population and one time in the HO population. This phenomenon illustrates that certain alleles related to RPR are steadily expressed across stages during the development of maize and contribute to the formation of stalk strength throughout the entire growth period. From another perspective, several QTL detected in this study, including qAhb1–2, qAhg2, qAhe2, qBhe2–1, and qBhc3, were identified and consistent with previous studies in which discrepant populations and genotypic data were used to perform association or linkage mapping to explore the genetic architecture of RPR [20, 27,28,29], which provides further support for the topic mentioned above that some QTL associated with RPR are steadily expressed in diverse experimental populations. These loci in the genome may be regarded as candidate genomic regions and can likely be used to perform fine mapping and identify functional genes to dissect the genetic mechanism of RPR. Additionally, the relatively obvious difference in QTL mapping for RPR among the experimental populations was determined according to the results of this study and other previous studies [1, 19, 21, 26]. A reasonable explanation of this difference is as follows: first, RPR is regarded as a complex quantitative trait with an intricate genetic mechanism. There may be epistatic effects in which a QTL can interact with one QTL in this experimental population and with another locus in other genetic background, so that the QTL will produce different genetic effects in different populations; the second explanation is that the QTL related to target traits can be legitimately detected following segregation and recombination within this region. In other words, the associated QTL cannot be identified in a situation in which both parents of an experimental population have identical alleles at a QTL; the third explanation is that many QTL with minor genetic effects will not be detected repeatedly because they likely lack sufficient statistical power for QTL mapping [1, 44]. Hence, further research is needed to break RPR into a few direct components or sub-factors that can be used to more effectively dissect the genetic basis and explore candidate genes for stalk strength with the purpose of providing advice for marker-assisted breeding.

As an efficient approach for exploring the genetic architecture of target traits, linkage mapping has been widely applied to identify QTL and explore functional genes in molecular genetics research. The identified QTL can be used to develop molecular markers to assist the practical breeding and accelerate the selection process. Several primary candidate genes were found in the MaizeGDB database that corresponded to RPR in this study. One candidate gene within pQTL6–2 with the gene model ID GRMZM2G031200 is located on chromosome 6 with a physical position of 164.69 Mb. The homologs of this gene in Arabidopsis encode regulated transcription factors, namely, secondary wall-associated NAC domain protein1 (SND1), which is required for the normal biosynthesis of the secondary wall and is a critical transcriptional switch to activate this developmental program. The SND1 combines with other transcription factors to constitute a transcriptional network that regulates downstream targets that affect the biosynthesis of the secondary wall in fibers [45]. Moreover, two candidate genes were detected within the genomic region of pQTL6–1 consisting of seven RPR-related QTL with the model IDs GRMZM2G027723 and GRMZM2G135108 that are relevant to the formation of cell wall components. The first gene is ZmCesA-2, which is required to produce cellulose and is involved in primary wall biosynthesis [46, 47]. The another gene, namely ZmPox3, is a critical gene in the process of lignin biosynthesis and is involved in monolignol polymerization and exerts a positive effect on cell wall digestibility [48]. In addition, a candidate gene located in pQTL4–2, ZmFBL41, has the biological function of resistance to banded leaf and sheath blight and indirectly influences the accumulation of lignin. This gene encodes an F-box protein (ZmFBL41) that interacts with the protein ZmCAD, and its knockout has a negative effect on ZmCAD degradation and thus promotes lignin biosynthesis and restricts lesion expansion [49]. These descriptions indicate that candidate genes corresponding to cell wall components may regulate and determine the formation of RPR. On the other hand, several studies have reported the results of QTL mapping for cell wall components [26, 50,51,52], and some of these QTL have overlapped confidence intervals with the QTL identified in this study. Regarding pQTL6–2 detected in the HO population, its genomic region is consistent with the physical position of QTL associated with lignin, acid detergent fiber (ADF), neutral detergent fiber (NDF), acid detergent lignin/NDF, and in vitro dry matter digestibility (IVDMD) identified in previous studies [50, 51, 53,54,55]. Based on the results of related studies, pQTL4–2 has a physical region that overlaps with other loci that are associated with IVDMD and lignin [49, 51], and the interval of pQTL8 is consistent with the QTL related to IVDMD, which have negative relationships with lignin content [26, 52]. Hence, this evidence implies that certain QTL have pleiotropic effects and can control both RPR and the content of cell wall components, which likely indicates that RPR is closely associated with cell wall components, such as cellulose, hemicelluloses, and lignin, consistent with the results of previous studies [20, 26, 28]. In addition, the results of the GO and KEGG analyses provide further support for the abovementioned scenario because the enrichment items and metabolic pathways associated with cellar components and the formation of the cell wall were identified in this study. Consequently, candidate genes relevant to RPR are likely involved in the regulation and control of cell wall components, which may exert an important effect to improve RPR.

Genomic selection has been recognized as an efficient approach to select for complex traits in comparison with conventional marker-assisted selection [37, 38, 56, 57]. In the present study, the prediction accuracy estimated in each stage and population was obviously different when the UV model was used to perform cross-validation, which was likely attributed to the different estimates of broad-sense heritability in various situations. This phenomenon is in accordance with previous studies illustrating that broad-sense heritability is an important factor that impacts the evaluation of prediction accuracy [58,59,60,61]. The information on functional loci identified by linkage mapping can be used as fixed effects in the GS model to improve the predictive ability of models, which was performed in this study and in previous research [62,63,64,65]. However, the prediction accuracy was increased when the fixed effects model was implemented using the QTL that explained the proportion of phenotypic variance lower than 10%, which was consistent with a previous study [66]. This result illustrates that QTL related to target traits have the potential ability to improve prediction accuracy and should be assigned important roles in the models. A remarkable improvement of prediction accuracy was achieved in this study when the multivariate model was applied to perform the GS. Several previous studies have shown that using correlated traits as auxiliary variates in the GS model can efficiently enhance the prediction accuracy and is obviously superior to the univariate model [67,68,69,70,71]. The increase of prediction accuracy estimated by the FIXED and multivariate models is mainly attributed to the higher proportion of genetic variance captured by these models than in the univariate model, as shown in the present study and previous researches [67, 72]. Another explanation may be that multivariate models likely capture both additive and nonadditive interaction effects by using auxiliary covariates in the models [73]. In brief, information on genetic dissection or additional auxiliary variates can be integrated into improved models to enhance the selection efficiency of complex agronomic traits, such as yield, RPR and other resistance-relevant traits. Hence, several advices for GS-assisted breeding programs can be concluded to improve selection efficiency and further enhance the genetic gain per breeding cycle. Regarding the target traits with complex genetic architecture, the first point is that the information of functional markers developed based on cloned genes or validated QTL can be applied to modified models to improve the precision of the estimated marker effects; the second point is that the information from genetic correlated traits can be used to achieve a higher prediction accuracy for the target trait, namely, by using other traits as auxiliary variates in statistical models; and the last point is that historical data accumulated by breeding experiments can be used to capture the interaction effects between the environment and genotype for the purpose of increasing the predictive ability of GS models. These points may allow better selection of candidate lines with good performance in practical GS-assisted breeding schemes.

Conclusions

Stalk lodging severely impacts plant standability and grain yield in maize. Stalk strength is an important agronomic trait and has a crucial effect on the improvement of lodging-resistance in modern maize breeding, and strong stalks can contribute to reduce lodging and achieve a harvestable yield. In the present study, phenotypic values evaluated in three environments and genotypic data identified by SNP array from two RIL populations were used to perform genetic dissection and genomic selection for rind penetrometer resistance. RPR has high genetic complexity and varies from stage to stage. Higher broad-sense heritability was obtained when RPR was investigated in the silking phase and stages after silking; these stages could be selected to better evaluate RPR for candidate lines. According to the QTL mapping results, RPR as a quantitative trait is controlled by multiple genes with minor effects. However, several QTL hotspots were identified in multiple stages across different environments, which might be able to be applied to develop functional markers to implement MAS for the selection of breeding lines. Furthermore, the annotation of candidate genes was based on the MaizeGDB database, and these candidates were usually involved in the regulation and formation of cell wall components. In addition, various models considering fixed effects or auxiliary variates were implemented to perform cross-validation, which achieved a remarkable improvement in prediction accuracy compared to the univariate model. Finally, the illustration of linkage mapping and genomic selection can provide pertinent suggestions for improving stalk strength and further enhancing lodging-resistance in maize breeding.

Methods

Plant materials

This experiment was performed using two RIL populations. The first was derived from crossing B73 and BY804, which consisted of 188 RILs. B73 is a famous elite line developed from the Iowa Stiff Stalk Synthetic population. BY804 is a special inbred line with a high kernel oil content, which was derived from a Beijing high-oil population. This RIL population was obtained from China Agricultural University. Another RIL population was composed of 215 lines, which were derived from a lodging-resistant maize hybrid with the elite inbred lines Zheng58 and HD568 as parents. Zheng58 is a famous inbred line that is the parent of the widely grown maize hybrid Zhengdan958. HD568 was purposefully selected with the criterion of high stalk strength. Finally, these two RIL populations are abbreviated as HO (high-oil) and LR (lodging-resistance) for simplicity.

Field trial and phenotyping

A randomized incomplete block design with two replicates was implemented for the trial in each year. For the HO population, an experiment was primarily performed in Beijing in the summer of 2012, in which all lines were sown in a single row and the planting density was 49,500 plants per hectare. In addition, both RIL populations were evaluated in Hainan Province in the winter of 2012 and Beijing in the summer of 2013. Furthermore, each line was assigned to a single-row plot, and the planting density was 60,000 plants per hectare. The RPR in seven stages, including the tenth-leaf stage (V10), days to silking (DTS), and 10 to 50 days after silking (AS10, AS20, AS30, AS40 and AS50), was evaluated in the middle of the flat side of the third stalk internode aboveground with an electronic rind penetrometer (AWOS-SL04, Aiwoshi Science & Technology Co., Ltd. Company, Hebei, China). In this experiment, two to five randomly selected plants were used to investigate RPR measurements, and the RPR of each line was determined by the mean of ten measures.

Phenotypic data analysis

Analysis of variance, broad-sense heritability and best linear unbiased estimation.

Analysis of variance (ANOVA) was performed using the aov function in the R package stats version 3.6.0 (R Core Team, 2019). The linear model is as follows:

$$Y_{ijk} = \mu + g_{i} + e_{j} + ge_{ij} + r_{k(j)} + \varepsilon_{ijk},$$

where yijk is the phenotypic values of target trait, μ is the grand mean, gi is the genetic effect of the ith genotype, ej is the environmental effect of the jth environment, geij is the interaction effect between the ith genotype and the jth environment, rjk is the effect of the kth replicate, and εijk is the residual error. Broad-sense heritability was estimated according to the formula:

$$H^{2} = \sigma^{2}_{\text{g}}/\left(\sigma^{2}_{\text{g}}+\sigma^{2}_{\text{ge}}/e + \sigma^{2}_{\varepsilon}/re \right), $$

where \( {\sigma}_{\mathrm{g}}^2 \), \( {\sigma}_{ge}^2 \), and \( {\sigma}_{\varepsilon}^2 \) are the variance components of genotype, genotype by environment interaction and random error, respectively, and r and e are the numbers of replicates and environments, respectively. In addition, the R package lme4 version 1.1–21 was used to perform the best unbiased linear estimation (BLUE) for the genetic effects with the following mixed linear model (MLM) [74, 75]:

$$ {y}_{ij k}=\upmu +{g}_i+{e}_j+g{e}_{ij}+{r}_{jk}+{\varepsilon}_{ij k}, $$

which all components were described based on Liu et al. (2019) [76]. The phenotypic values of RPR determined in the winter of 2012 and the summer of 2013 were jointly used to perform ANOVA and estimate the BLUE values. The BLUE values were used to perform QTL mapping to detect the relationship between H2 and the QTL number.

Construction of the hierarchical clustering of RPR in various stages

For the construction of the hierarchical clustering of RPR in two RIL populations, the BLUE values of each stage were first standardized to a zero mean and unit variance according to the following formula:

$$ {Y}_{ij}=\left({X}_{ij}- mean\left({X}_{.j}\right)\right)/ sd\left({X}_{.j}\right), $$

where Yij is the transformed value, Xij is the BLUE values of the ith genotype in the jth stage, mean() is defined as the mean value, and sd() is the standard deviation. Furthermore, the transformed values of RPR in each stage were used to calculate the Euclidean distances between all pairs of stages, which was performed by the euclidean method with the dist function in the R package stats version 3.6.0. The formula is as follows [43]:

$$ {D}_{AB}={\left({\sum}_{i=1}^{\mathrm{n}}{\left({Y}_{iA}-{Y}_{iB}\right)}^2\right)}^{1/2}, $$

where DAB is the value of the Euclidean distance between stages A and B; YiA and YiB are the transformed values of the ith genotype in stages A and B, respectively; and n is the individual numbers of two RIL populations. The hclust function was used to construct the hierarchical clustering tree based on the distance values of all pairs of seven stages.

Phenotypic and genetic correlation

The phenotypic correlation (rp) is a measurement of the association between the phenotypic values of individuals for a pair of targeted traits. The genetic correlation (rg) is a parameter of genetics that can be used to evaluate the degree of association in genetic variation between two traits. The correlation coefficients are estimated by the following formulae [77,78,79]:

$$ {r}_p= co{v}_p\left(A,B\right)/\sqrt{v_{pA}{v}_{pB}}; $$
$$ {r}_g= co{v}_g\left(A,B\right)/\sqrt{v_{gA}{v}_{gB}}; $$

where covp() and covg() are the phenotypic and genetic covariances between traits, respectively, and Vp and Vg are the phenotypic and genetic variances of target traits, respectively. Phenotypic and genetic correlation analyses were performed using the cor function in the R package stats version 3.6.0 (R Core Team, 2019) and the asreml function in the R package ASReml version 3.0 [80], respectively. Phenotypic data of RPR evaluated in each environment were utilized to estimate the phenotypic correlation coefficient. However, the RPR values determined in the winter of 2012 and the summer of 2013 were used to perform an analysis of genetic correlation.

Genotypic data analysis

Genotyping and quality control

For the HO population, all inbred lines were used for genotyping with the MaizeSNP3K array, which is a subset of the Illumina MaizeSNP50 BeadChip [81]. Markers with missing rates greater than 0.20 and minor allele frequencies (MAFs) less than 0.05 were removed. However, the maize 55 K SNP array [82] was used to genotype all the RILs in the LR population. The process of quality control for genotypic data in LR population was based on Liu et al. (2019) [76]. Moreover, chi-square tests were performed for all the SNPs in each population with the aim of filtering out markers with segregation distortion (P < 0.05) in the two RIL populations.

Construction of the bin map and QTL mapping

Bin markers were detected and aligned with the sliding-window approach, which was applied to identify variant calling errors and evaluate the ratio of SNP alleles derived from the parents. The detailed method of bin map construction was described as Liu et al. (2019) [76], which was based on several studies [83, 84]. The genetic maps of both RIL populations were constructed by the Kosambi mapping function in the mstmap function in the R package ASMap version 1.0–4 [85]. QTL related to RPR were detected by composite interval mapping using the cim function in the R package R/qtl version 1.44–9 [35]. The threshold of logarithm of the odds (LOD), confidence interval of each QTL, and pleiotropic QTL were analyzed and determined according to several studies [76, 83]. Furthermore, the most likely candidate genes within the confidence interval were consulted and selected from the maize genetics and genomics database (MaizeGDB, https://www.maizegdb.org/). Because fewer qualified markers were retained in the HO population, these SNPs were directly used to construct a linkage map and perform further analyses without detecting bin markers. However, the bin markers could be checked and used to perform subsequent analyses in the LR population.

Analysis of GO enrichment and KEGG pathway

Gene ontology (GO) enrichment analysis was performed using singular enrichment analysis (SEA) by AgriGO version 2.0 (http://systemsbiology.cau.edu.cn/agriGOv2/ index.php) with the Fisher statistical test method and Yekutieli multitest adjustment method at a significance level of P < 0.05 [86]. Additionally, the slim of GO function was summarized by GOSlimViewer (https://agbase.arizona.edu/cgi-bin/tools/goslimviewer_select.pl) of AgBase [87]. For Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway annotation, KOBAS version 3.0 (http://kobas.cbi.pku.edu.cn/index.php) was used to perform an analysis of the functional gene set enrichment with Fisher statistical method and the Benjamini and Yekutieli FDR (false discovery rate) correction method (P < 0.05). Significant GO items and KEGG entries were extracted to draw plots.

Genomic selection

A 5-fold cross-validation scheme with 100 replicates was used to assess the performance of each model and calculate the prediction accuracy (rMP), which was the correlation coefficient between genomic-estimated breeding values (GEBVs) and phenotypic values. Three models were used to perform cross-validation, which were developed from the genomic best linear unbiased prediction (GBLUP) model. The univariate model (UV) is essentially a general form of the GBLUP model, and the mixed model is as follows [88, 89]:

$$y = 1_{n}\mu + u + \varepsilon ,$$

where y is a vector (n × 1) of phenotypic values, 1n is a vector (n × 1) of ones, μ is the overall mean, u is the random effects that obeys a normal distribution N(0,G\( {\sigma}_u^2 \)), \( {\sigma}_u^2 \) is the genetic variance, G is the genomic relationship matrix among all genotypes calculated following VanRaden (2008) [88], ε is a vector (n × 1) of random terms with a normal distribution N(0,I\( {\sigma}_{\varepsilon}^2 \)), and I is an identity matrix. In addition, n is the number of individuals. For the GBLUP model including fixed effects (FIXED), the formula can be described as [76]:

$$ \boldsymbol{y}=\boldsymbol{X}\boldsymbol{\beta } +\boldsymbol{u}+\varepsilon, $$

where β is the vector (n × 1) of fixed effects, X is the (n × p) design matrix, and the other parameters are identical to the description mentioned above. All markers with peak LOD value of QTL in each target trait were selected to construct the X design matrix, and p denotes the number of target markers. However, the G matrix is calculated by the design matrix containing m-p markers, where m is the number of all markers in each RIL population. Finally, the multivariate model was developed from the univariate model, and the model was as follows [67]:

$$y = 1_{n}\mu + u + v + \varepsilon ,$$

where y is a vector (n × 1) of the target variate, v is the random effects for auxiliary variates with a normal distribution N(0,Gv\( {\sigma}_v^2 \)), \( {\sigma}_v^2 \) is the variance component of v, and the other parameters are identical to the description mentioned above. The Gv is the multivariate relationship matrix, which was calculated as follows: Gv = nMvMv’/trace(MvMv’), where Mv = [y1,y2,…,yi,...,yt-1], yi is a scaled vector (n × 1) of the phenotypic values of the ith environment or stage that were standardized to zero mean and unit variance, t is the number of all variates, and trace denotes the sum of all diagonal elements. The phenotypic data of the t-1 environments or stages are recognized as auxiliary variates in the model. If auxiliary variates were derived from multiple environments, the multivariate model would be abbreviated as ME. For auxiliary variates derived from multiple stages, the model was represented as MS for short. These models were fitted using the R package BGLR version 1.0.8 [90], and the iteration of the Gibbs sampler was set to 10,000, with the first 5000 samples discarded as burn in.