Background

Chlorophyll, which is an essential photosynthetic pigment in the chloroplasts of higher plants, is closely related to leaf photosynthesis and yield potential; it is critical for the accumulation of carbohydrates and contributes to complex processes mediating the acquisition of energy from light and electron transport [1,2,3]. In the Arabidopsis thaliana chlorophyll synthesis pathway, EIN3/EIL1 induces the expression of genes encoding protochlorophyllide oxidoreductase A and B, which cooperatively function with phytochrome-interacting factor 1 (PIF1) [4], thereby preventing seedling photo-oxidation and promoting cotyledon greening [5]. Furthermore, PIF1 can regulate the expression of PORC [6] and interact with gibberellin (GA)-regulated DELLA proteins [7] as well as a transposase-derived transcription factor (i.e., FHY3) to modulate chlorophyll biosynthesis [8]. Another study revealed that BRAHMA encodes the SWI2/SNF2 chromatin-remodeling ATPase that helps to regulate a novel mechanism underlying chlorophyll biosynthesis; compared with wild-type A.thaliana plants, RNA-interference transgenic seedlings of BRAHMA have a higher greening rate under light and accumulate less protochlorophyllide and reactive oxygen species [9]. The chlorophyll content (CC) is related to seedling development and survival [10], but it is also directly or indirectly related to leaf senescence and crop yield [11]. Earlier research on rice demonstrated that OSWRKY5 is a transcription factor that promotes leaf senescence via OsNAC2 [12], which affects abscisic acid-induced leaf senescence and the rice yield [11]. Similarly, the rice leaf CC and photosynthetic efficiency are closely associated with the accumulation of dry matter [13]. Moreover, there is a positive correlation between the grain CC and the grain filling rate [14]. Previous studies on the chlorophyll of Chinese cabbage [3], soybean [15], Brassica napus L. [16], and other plants [17] verified the importance of chlorophyll.

Maize is one of the most widely grown cereal crops worldwide. Accordingly, increasing maize productivity is crucial for agricultural development [18]. The ear leaf is one of the most important leaves of maize plants partly because of its close association with the yield [19]. Recent studies showed that increases in CC and the photosynthetic rate are critical for producing high maize yields [20, 21]. The Soil-Plant Analysis Development (SPAD) value, which may be used to represent the CC, can be determined using a rapid, accurate, and non-destructive measurement method involving the SPAD-502 chlorophyll meter [22]. Although several studies on maize chlorophyll-related genes were conducted recently [23, 24], the molecular mechanism of regulating chlorophyll remains to be elucidated. Therefore, an in-depth analysis of the genetic basis of the maize ear leaf CC is necessary for breeding new maize varieties with efficient photosynthetic activities and for increasing the maize yield.

A genome-wide association study (GWAS) can effectively reveal the genetic basis of complex quantitative traits according to linkage disequilibrium (LD). The advantages of linkage analyses over other methods include their higher throughput and greater resolution [25]. For example, YIGE1, which is an important gene for increasing maize ear growth and yield, was cloned following a GWAS [26]. Additionally, CC-related genes were mapped according to a GWAS [27, 28]. In another study, the rice flag leaf CC was determined and GWAS data were combined with high-density markers to detect several significant loci associated with chlorophyll-related traits in different rice subpopulations, including the locus for Ghd7 [29]. The development of statistical methods has enabled researchers to design and apply various GWAS models, including single-locus model: MLM [25] and multi-locus models: MLMM [30], SUPER [31], FarmCPU [32], and BLINK [33]. Recently, 3VmrMLM was established as a novel MLM with three variance components for more efficient calculations [34]. This model is useful for identifying quantitative trait nucleotides (QTNs) and revealing QTN-by-environment interactions (QEIs) and QTN-by-QTN interactions.

In this study, the SPAD values of ear leaves from 290 maize inbred lines were determined in three environments and best linear unbiased prediction (BLUP) values were also calculated to represent the maize CC phenotype. A GWAS was performed using 1.25 million SNPs and six models (MLM, MLMM, SUPER, FarmCPU, BLINK, and 3VmrMLM) to elucidate the genetic basis of CC in maize. The results of this study will enrich our understanding of the molecular mechanism underlying the maize CC. The generated data may be used to further characterize the genes related to chlorophyll synthesis, while also providing the theoretical foundation for breeding maize plants with optimal light-use efficiency.

Results

Phenotypic variation

The ear leaf Soil-Plant Analysis Development (SPAD) values [i.e., Chlorophyll Content (CC)] in Yuanyang (YY), Hebi (HB), and Yongchen (YC) at 10 days after pollination were analyzed. Additionally, the BLUP values were calculated and correlations were assessed. The index values in YC were lower than those in YY and HB, but similar values were obtained in YY and HB (Table 1; Fig. 1). Among the three environments (i.e., HB, YY, and YC) and Best Linear Unbiased Prediction (BLUP), the standard deviation in the SPAD value was highest for YC and YY (5.38), whereas the standard deviation was lowest for BLUP (3.83). The SPAD values were significantly correlated between any two environments (Fig. 2). Moreover, absolute value of skewness and kurtosis of the SPAD value for all environments and BLUP is less than 1 (Table 1) and it displayed a normal distribution (Figure S1), which implies CC follows a typical quantitative pattern that is controlled by numerous genes with small effects. The double-factor variance analysis revealed extremely significant genetic and environmental effects on CC. The genotype-by-environment interaction also had a significant effect (Table S1). According to the results, genetic factors influenced the maize CC more than environmental factors; the broad-sense heritability (0.86) was consistent with this observation (Table 1).

Table 1 Descriptive statistics of the CC (SPAD value) of maize in different environments
Fig. 1
figure 1

Variations in the SPAD values (CC) among the maize ear leaves in three environments (HB = Hebi, YY = Yuanyang, and YC = Yongcheng) and variations according to BLUP. The same abbreviations are used in the other figures. The black horizontal line indicates the median

Fig. 2
figure 2

Pearson coefficients for the CC-related traits of maize lines in different environments. The lines were selected from an association mapping panel

Identification of the QTNs for the CC-related traits based on a GWAS

A GWAS was performed using six models (MLM, BLINK, MLMM, SUPER, FarmCPU, and 3VmrMLM). The QQ plots of all of the models, with the exception of 3VmrMLM, reflected the relative reliability of the GWAS results (Figure S2). Using − log10(p-value) ≥ 5.75 or LOD score ≥ 3 as the significance threshold, we counted the QTNs for the six models in different environments. In addition to the “Single_env” method, the “Multi_env” method of 3VmrMLM was used to detect QTNs. The identified candidate genes were divided according to the models, environments, and methods. In YY, the BLINK, MLMM, SUPER, FarmCPU, and 3VmrMLM (“Single_env”) methods identified 7, 4, 7, 7, and 76 candidate genes, respectively. In YC, 4, 1, 11, and 81 candidate genes were detected using MLM, MLMM, SUPER, and 3VmrMLM. In HB, 8, 11, 7, 33, 9, and 74 candidate genes were revealed by MLM, BLINK, MLMM, SUPER, FarmCPU, and 3VmrMLM, respectively. For BLUP, 47 and 82 candidate genes were detected by SUPER and 3VmrMLM (“Single_env”), respectively, whereas 3VmrMLM (“Multi_env”) detected 145 candidate genes, including 21 genes associated with QEIs (Table S2, Fig. 3). The R2 values for the QTNs were 0.29–10.28, indicating they explained 0.29–10.28% of the phenotypic variation. The LOD scores for the QTNs detected using 3VmrMLM were 3.16–43.35 (Table S2). Thus, compared with the other models, SUPER and 3VmrMLM detected more candidate genes for the subsequent correlation analysis.

Fig. 3
figure 3

Number of candidate genes identified in different environments or by different models. “YY”, “YC”, “HB”, “BLUP” and “Mutli” indicated that the candidate genes identified in the environments of Yuanyang, Yongcheng, Hebi, Best linear unbiased prediction and by 3VmrMLM “Multi_env” method, respectively

Analysis of co-located QTNs

We investigated 19 co-located QTNs across various models, methods, or environments. Out of these, four QTNs were detected within the same environment but using different models. Additionally, three QTNs were detected in different environments, but utilizing the same model. Furthermore, three QTNs were discovered in different environments and analyzed using different models. Three QTNs were found in different environments and analyzed using different methods. It is worth noting that six QTNs were identified in the same environment through the application of both the “Single_env” and “Multi_env” methods of 3VmrMLM. Moreover, three QTNs were detected in different environments using the two methods of 3VmrMLM (Table 2). For each QTN, we defined a 100 kb interval, encompassing 50 kb upstream and downstream, as the respective QTL region. It is important to highlight that two of the co-located QTNs were within the same QTL, while the remaining 17 QTNs were situated in distinct QTL regions. Out of these 18 QTL regions, two lacked identified candidate genes, However, the remaining 16 QTL regions contained a total of 69 candidate genes (Table 2). These candidate genes detected by each model were analyzed and summarized. In details, the candidate genes detected using MLM were not co-located with the candidate genes detected by the other five models. Furthermore, we perform a comprehensive evaluation of different models, with a specific emphasis on their efficacy in gene localization and 21 candidate genes are the co-located genes that identified at least two different models. Notably, SUPER, 3VmrMLM, and FarmCPU demonstrated superior performance and were therefore deemed to produce better results because they exhibited the highest number of co-located candidate genes (Fig. 4A). Additionally, the 3VmrMLM model (only this model) identified a total of 395 candidate genes. Specifically, 295 genes were identified using the “Single_env” method, while 145 genes were identified using the “Multi_env” method. Importantly, there were 45 candidate genes that were detected by both the “Single_env” and “Multi_env” methods of the 3VmrMLM model (Fig. 4B).

Table 2 Co-located QTNs and candidate genes as well as the related homologs
Fig. 4
figure 4

Candidate genes revealed by the models used for the GWAS. (A) Venn diagram of the number of co-located candidate genes detected by five models. (B) Venn diagram of the number of co-located candidate genes detected by the 3VmrMLM “Single_env” and “Multi_env” methods

Candidate gene analysis

The candidate genes identified in Table S2 were categorized into two groups: those with functional annotations and those without functional annotations. We performed Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analyses specifically for the genes with functional annotations. The top 20 GO terms related to biological processes and KEGG pathways were determined, revealing that the candidate genes were predominantly enriched in metabolic processes (Fig. 5). It has been established in previous studies that chlorophyll serves as the primary pigment for photosynthesis. Photosynthesis, being the basis and source of plant metabolism, suggests a close relationship between plant metabolism and chlorophyll [35, 36]. Taking into account the candidate genes located within the co-located QTNs, we conducted a detailed examination and identified corresponding homologs in A. thaliana or rice to further characterize the functional aspects of these candidate genes (Table 2). Based on the aforementioned findings, it is crucial to analyze the key candidate genes within the co-located QTLs. These genes hold significant potential for further research and exploration.

Fig. 5
figure 5

Enriched GO terms and KEGG pathways among the candidate genes. (A) Enriched GO terms in the biological process. (B) Enriched KEGG pathways

One of the candidate genes, GRMZM2G110408, was associated with two co-located QTNs (chr9.S_110657959 and chr9.S_110659989). This gene encodes a nucleoside triphosphatase involved in chlorophyll biosynthesis. In Arabidopsis thaliana, its homolog has been reported to encode a metalloproteinase that contributes to thylakoid membrane biogenesis [37] and aids in the repair of Photosystem II (PSII) after photoinhibition-related damage. The expression levels of this gene (both transcript and protein levels) increase in response to light, we named it as ZmCCS3. Among the annotated candidate genes, two genes (GRMZM2G376595 and GRMZM2G098420) were found to be associated with leaf senescence (GO:0010150, p-value = 0.0052) (Fig. 5A), a process involving changes to chlorophyll [38]. GRMZM2G098420, co-located by both the “Single_env” and “Multi_env” methods of 3VmrMLM. In Arabidopsis thaliana, its homolog encodes an autophagy protein 5, which is part of the Atg12–Atg5–Atg16 complex (cellular component) (Table S3). This suggests its involvement in chlorophyll degradation. In contrast, GRMZM2G376595 encodes a phosphatidylcholine-sterol O-acyltransferase. Another candidate gene, GRMZM2G056920, was co-located in three environments (YC, HB, and BLUP) according to the “Single_env” method and the “Multi_env” method. The LOD scores of the associated QTNs were relatively high (ranging from 7.36 to 15.69). This gene encodes a protein involved in the construction of the cell wall structure. A previous study indicated that the overexpression of its Arabidopsis thaliana homolog, AT3G12130 (KHZ1), significantly promotes leaf senescence [39]. The gene GRMZM2G045314, co-located by four models (BLINK, MLMM, SUPER, FarmCPU), encodes a GTP-binding protein 2. In rice, its homolog is involved in controlling grain development and the grain filling process, directly affecting yield. This suggests that further research on this gene could be valuable for optimizing yield [40]. GRMZM2G093347, co-located by both the “Single_env” and “Multi_env” methods of 3VmrMLM, encodes adenine Adenine phosphoribosyl transferase 1 (APT1). An Arabidopsis thaliana mutant lacking ATP1 activity exhibited higher chlorophyll content (compared to the wild-type leaf chlorophyll content), suggesting that this gene may be associated with chlorophyll accumulation [41].

Apart from the comprehensive analysis of the co-located candidate genes, we observed that 3VmrMLM outperformed the other models in terms of detecting QTNs and identifying more candidate genes associated with CC. For instance, GO analysis revealed genes enriched in protein localization within chloroplasts (GO:0072598, p-value = 0.0437) (Table S3), one such gene, GRMZM5G839422, was functionally annotated as encoding an inner membrane protein. In an earlier study, a mutant carrying a mutated ALB3 gene (homologous to GRMZM5G839422) exhibited abnormal chloroplasts and a lower CC compared to the wild-type control [42]. The A. thaliana homolog of GRMZM2G005848 (ARC5) encodes a chloroplast division protein. Mutations in this gene lead to a decrease in the number and size of chloroplasts in mesophyll cells [43]. In cucumbers, the homolog of this gene causes the cucumber peel to appear light green [44]. The functional annotations of this gene include biological processes related to chloroplast fission (GO: 0010020, p-value = 0.0240) and organization (GO:0009658, p-value = 0.0431) (Table S3). By searching for homologous genes in Arabidopsis and rice, some interesting discoveries were made. GRMZM2G017077 appears to affect chloroplasts and chlorophyll synthesis. The A. thaliana homolog, VIPP1, is a multifunctional protein in chloroplasts with important effects on the envelope [45]. It is also involved in the maintenance of photosynthetic membranes [46]. The gene GRMZM2G135283 falls within the QTL interval of the QTN (chr1.S_274280041, P-value = 1.18 × 10− 12) that located on chromosome 1 was detected in YC by 3VmrMLM model. Its rice homolog, OsSHM1, encodes a serine hydroxymethyltransferase. The CC of the OsSHM1 mutant is significantly lower than that of the wild-type control and the mutant seedlings exhibit a less green phenotype compared to the wild-type seedlings during early growth and development [47]. The rice homolog of GRMZM2G171444 encodes a chloroplast precursor, while the A. thaliana homolog encodes a chloroplast envelope and matrix protein that influences chlorophyll biosynthesis. However, further investigation and characterization of GRMZM2G171444 are required. Based on the enriched GO terms, KEGG pathways, co-localization results and earlier studies on the homologs of the candidate genes, we selected 11 candidate genes. Among them, the QTN associated with GRMZM2G005848 and GRMZM2G098420 were located approximately 20 kb downstream of the two genes, while the QTN of the other nine genes were located within the gene regions. These genes show significant potential for further research into their association with CC.

Analysis of candidate gene expression patterns

We examined the expression profiles of the 11 selected candidate genes in different B73 tissues and constructed a heatmap of the FPKM-based expression levels (Fig. 6). Earlier research confirmed CC is closely related to leaf photosynthesis [48] and directly or indirectly affects leaf senescence [11]. Therefore, we focused on the candidate genes highly expressed in the leaf tissues (S11–S14), especially the mature leaves (S14). The GRMZM2G171444 expression level was significantly higher in S14 than in the other tissues. Moreover, GRMZM2G171444 was the most highly expressed candidate gene in mature leaves (Fig. 6, Table S5). However, GRMZM5G839422, GRMZM2G110408, and GRMZM2G017077 were also highly expressed in S14. The GRMZM2G093347 expression level was high in all tissues. In contrast, the other genes were expressed at relatively low levels in the leaves. Overall, the analysis of candidate gene expression indicated that GRMZM2G171444 and ZmCCS3 may be related to CC and involved in chlorophyll synthesis, but GRMZM5G839422 and GRMZM2G017077 may also influence CC.

Fig. 6
figure 6

Heatmap of the expression profiles of key candidate genes. S1: 6–7_Internode, S2: 7–8_Internode, S3: Ear_Primordium_2–4_mm, S4: Ear_Primordium_6–8_mm, S5: Embryo_20_DAP, S6: Embryo_38_DAP, S7: Endosperm_12_DAP, S8: Endosperm_Crown_27_DAP, S9: Female_Spikelet_Collected_on_Day_as_Silk, S10: Germinated_Kernels_2_DAI, S11: Leaf_Zone_1_Symmetrical, S12: Leaf_Zone_2_Stomatal, S13: Leaf_Zone_3_Growth, S14: Mature_Leaf_8, S15: Mature_Pollen, S16: Pericarp_Aleurone_27_DAP, S17: Primary_Root_5_Days, S18: Root_Cortex_5_Days, S19: Elongation_Zone_5_Days, S20: Root_Meristem_Zone_5_Days, S21: Secondary_Root_7–8_Days, S22: Silk, S23: Vegetative_Meristem_16–19_Days

Haplotype analysis of candidate genes

We performed a haplotype analysis of the key candidate genes (Table S6). The p-values for the phenotypic differences associated with QTNs were less than 0.01 for chr3.S_17284190 (LOD = 6.7012, p-value = 2.77 × 10− 8) and chr3.S_168368384 (LOD = 15.2228, p-value = 5.60 × 10− 17) within GRMZM5G839422 and GRMZM2G017077 respectively. Whereas they were less than 0.001 for chr5.S_215364939 (p-value = 6.40 × 10− 7) and less than 0.0001 for the remaining eight QTNs. Accordingly, the phenotypic differences related to the SNPs in the 11 key candidate genes were all extremely significant (Figure S3, Figs. 7B, 8B and 9B). Next, we extracted all of the polymorphic loci within the QTLs containing the significant QTNs of the key candidate genes for the LD analysis (Figure S4). There was a strong linkage relationship between the significant QTNs of several candidate genes and the polymorphic sites (Fig. 8 C, 9 C). The linkage relationship was especially strong for GRMZM2G110408, which was detected using SUPER (BLUP and HB) and FarmCPU (HB) (Fig. 7).

Fig. 7
figure 7

GRMZM2G110408 (ZmCCS3) affects CC-related traits. (A) Manhattan plot of the SPAD values for BLUP and HB. The line represents the threshold − log10(p-value) ≥ 5.75 (p ≤ 1.79 × 10− 6). (B) Differences in the CC-related traits of haplotypes 1 and 2. (C) R2 values for all SNPs in the QTL of the significant QTN

Fig. 8
figure 8

GRMZM2G171444 affects CC-related traits. (A) Manhattan plot of the SPAD values for YC. The line represents the threshold LOD score ≥ 3.0. (B) Differences in the CC-related traits of haplotypes 1 and 2. (C) R2 values for all SNPs in the QTL of the significant QTN

Fig. 9
figure 9

GRMZM2G135283 affects CC-related traits. (A) Manhattan plot of the SPAD values for YC. The line represents the threshold LOD score ≥ 3.0. (B) Differences in the CC-related traits of haplotypes 1 and 2. (C) R2 values for all SNPs in the QTL of the significant QTN

Discussion

Chlorophyll is critical for photosynthesis. Changes in CC directly affect the leaf photosynthetic efficiency, which ultimately influences the crop yield [48,49,50]. Maize plants can efficiently use light energy and accumulate a large amount of dry matter through photosynthesis [51]. Hence, the maize yield is greatly affected by photosynthesis, but it is also considerably influenced by leaf senescence if chlorophyll is degraded [38]. A recent study demonstrated that adjusting the timing of maize leaf senescence can substantially modify the maize yield [52]. There has been extensive research on chlorophyll-related genes, especially those associated with leaf senescence [53,54,55]. However, compared with other plants, there have been relatively few related studies on maize. Because of the considerable interest in chlorophyll synthesis, the underlying mechanism is continually being clarified [56, 57], but the molecular mechanism regulating CC will need to be further analyzed. In addition, there is growing interest in stay-green traits among crop breeders [58, 59]. Therefore, studying the genetic basis of maize CC, identifying important genetic variants, and mining-related candidate genes are crucial for the genetic improvement of maize via breeding.

Genome-wide association studies have been conducted to elucidate the genetic basis of complex quantitative traits and to screen for genes related to agronomic traits [60, 61]. In the current study,

the normal distribution of CC indicate that it is jointly determined by multiple genetic variants (Figure S1), and its heritability is calculated to be 0.86 (Table S4). Next, MLM [25], MLMM [30], SUPER [31], FarmCPU [32], BLINK [33], and 3VmrMLM [34] were used to analyze the ear leaf CC of 290 maize inbred lines at 10 days after pollination. More specifically, a GWAS was completed using 1.25 million high-density markers and the CC (SPAD values) (Figure S5). Finally, 140 significant QTNs and 481 genes were identified (Table S1). There were 19 significant co-located QTNs, of which 10 were detected by at least two different models or methods, three were detected in at least two different environments, and six were identified by different models or methods in different environments (Table 2). The statistical analysis of the number of candidate genes detected by each model indicated 3VmrMLM and SUPER detected the most candidate genes. Moreover, co-located candidate genes were detected by all of the models, except for MLM, with 45 co-located genes revealed by the two 3VmrMLM methods (Figs. 3 and 4; Table 2). These results may be relevant for identifying genes significantly associated with CC. Relevant published reports and the functionally annotated A. thaliana and rice homologs were used to clarify the functions of the co-located genes (Table 2). The enriched GO terms and KEGG pathways were considered and 11 candidate genes were selected for the subsequent experiments (i.e., expression analysis, haplotype analysis, and LD analysis of candidate genes). On the basis of the results of these experiments, we identified GRMZM2G110408, GRMZM2G171444, and GRMZM2G135283 as candidate genes related to the maize ear leaf CC. Of these genes, we suggest ZmCCS3 may be the most worthwhile candidate gene for future investigations. Among the 11 key candidate genes, the GO analysis indicated GRMZM5G839422 likely encodes a protein localized in chloroplasts (Table S3). A mutation to the A. thaliana homolog (ALB3) of GRMZM5G839422 reportedly leads to a decrease in CC [42]. In the present study, GRMZM5G839422 was most highly expressed in the mature leaf stage (Fig. 6, Table S5). The GO terms assigned to GRMZM2G005848 suggested the encoded protein is also associated with chloroplasts. Consistent with this observation, the A. thaliana homolog of this gene encodes a chloroplast protein [43]. The GRMZM2G017077 expression level was relatively high (Fig. 6, Table S5). Additionally, its A. thaliana homolog VIPP1 encodes a multifunctional chloroplast protein that may affect chloroplast functions and CC [45]. The GO analysis of GRMZM2G376595 and GRMZM2G098420 indicated that these two genes contribute to leaf senescence (biological process) (Fig. 5A). Moreover, GRMZM2G098420 encodes autophagy protein 5. The overexpression of KHZ1, which is the A. thaliana homolog of GRMZM2G056920, can significantly promote leaf senescence [39]. We propose that these three genes may be involved in chlorophyll degradation. Both GRMZM2G045314 and GRMZM2G093347 were among the co-located genes. The rice homolog of GRMZM2G045314 is related to grain development (e.g., grain filling stage) [40]. Earlier research showed that a mutation to APT1, which is a homolog of GRMZM2G093347, enhances the accumulation of chlorophyll in leaves [41]. These functions are closely related to CC, implying these genes should be more precisely characterized in future studies.

In this study, GRMZM2G171444 was more highly expressed than the other candidate genes in the mature leaf stage (Fig. 6, Table S5). A significant QTN (chr5.S_190752068) in this gene was detected by 3VmrMLM in YC, with a LOD score of 11.45, suggestive of its importance (Table S2, Fig. 8A). This significant QTN was used for a haplotype analysis [62], which revealed that the phenotypic difference between the two haplotypes of this gene was significant (4.18 × 10− 5) (Fig. 8B). The A. thaliana and rice homologs of this gene encoding a ribosome protein have not been identified, but the A. thaliana and rice databases suggest the homologs may be involved in the synthesis of chloroplast precursors. This possibility will need to be experimentally verified. The GRMZM2G135283 candidate gene detected by 3VmrMLM in YC contained a significant QTN (chr1.S_274280041) with a LOD score of 10.9716 (Table S2, Fig. 9A). The phenotypes associated with the two haplotypes of this QTN differed significantly (2.22 × 10− 5) (Fig. 9C). Because the LD analysis detected a strong linkage relationship, we speculate that the candidate gene is highly correlated with chlorophyll traits (Fig. 9B). We also determined that GRMZM2G135283 encodes a serine hydroxymethyltransferase. A mutation to the rice homolog (OsSHM1) of this gene does not affect seed germination, but the mutant leaves are less green and have a lower CC than the wild-type leaves. Additionally, this rice homolog is expressed in all examined tissues (i.e., root, stem, leaf, and young ear), but especially in the leaves [47]. In the current study, 3VmrMLM detected more noteworthy candidate genes than the other models. Finally, and most importantly, our findings imply that ZmCCS3 should be examined more comprehensively in future investigations. In particular, this gene was identified on the basis of two co-located QTNs (chr9.S_110657959 and chr9.S_110659989). The p-values for chr9.S_110657959 in the BLUP and HB environments of SUPER were respectively 6.9 × 10− 7 and 2.5 × 10− 7, which differed from the corresponding p-value in the HB environment of FarmCPU (1.3E-06). The p-values for chr9.S_110659989 in the BLUP and HB environments of SUPER were respectively 1.7 × 10− 6 and 9.3 × 10− 7 (Fig. 7A, Table S2). Furthermore, chr9.S_110657959, which was detected by the two models (SUPER and FarmCPU), was selected as a significant QTN for the haplotype analysis. The significance of the phenotypic difference between the two haplotypes was the highest among the 11 candidate genes (2.92 × 10− 10) (Fig. 7B), with haplotype 1 detected as the favorable haplotype. Of the 290 maize materials included in this study, 210 were temperate lines and 80 were tropical/subtropical lines, with the latter accounting for 27.59% of the examined materials. 214 materials contained the haplotype, with tropical/subtropical materials accounting for 25.23% (54/214) of the total. The findings of this study suggest that CC might be related to the germplasm type. Specifically, germplasm from temperate regions may have a higher CC than germplasm from other regions. Moreover, the LD analysis indicated that chr9.S_110657959 in ZmCCS3 had a strong linkage relationship with polymorphic sites (Fig. 7C), implying this gene might be highly correlated with CC. The A. thaliana homolog of this gene encodes a metalloproteinase localized in the thylakoid membrane, wherein it repairs PSII adversely affected by photoinhibition [37]. The functional annotation showed that the nucleoside triphosphatase encoded by ZmCCS3 is involved in chlorophyll biosynthesis (Table S2). Overall, we identified 11 candidate genes encoding proteins with regulatory effects on CC. Furthermore, we propose that ZmCCS3 is critical for the regulation of CC. The data generated in this study may provide the basis of future research conducted to improve high photosynthetic efficiency of maize and breed ideotype-based maize varieties suitable for commercial cultivation.

Conclusions

Our study compared the results of six GWAS models (a single-locus model and five multi-locus models), screen candidate genes within the range of co-located QTNs, combine functional annotation, GO and KEGG analysis, mine 11 CC-related key candidate genes. Based on the haplotype and LD analysis results of these key genes, GRMZM2G110408 (ZmCCS3) is considered worthy of further study. This finding broadens the understanding of the genetic basis of CC and may be relevant for the ideotype-based breeding of new maize varieties with high photosynthetic efficiency.

Materials and methods

Experimental materials and field cultivation

The association mapping panel used in this study, which consisted of 290 maize inbred lines (210 temperate lines and 80 tropical/subtropical lines), was derived from 540 inbred lines [62, 63] and was provided by Professor Yan Jianbing of Huazhong Agricultural University. All 290 maize inbred lines were grown at the Yuanyang Modern Agricultural Science and Technology Park of Henan Agricultural University (Yuanyang; N35°, E113°; i.e., YY), the XunXian Experimental Station of the Hebi Academy of Agricultural Sciences in Henan province (Hebi; N35°, E114°; i.e., HB), and the Cotton Seed Farm in Yongcheng, Henan (Yongcheng; N33°, E116°; i.e., YC) in the summer of 2019. Two replicates of a complete randomized block design were used. Specifically, each line was grown in two (4 m long) rows, with 67 cm between rows and 10 plants per row. Routine field management practices were applied during the cultivation of open-pollinated plants.

Determination of CC

For 10 days after pollination, five plants per row were randomly selected to determine the daily SPAD value (i.e., CC) for the ear leaf at 9:00–11:30. Briefly, a hand-held SPAD instrument: SPAD-502Plus (i.e., Minolta corporation, Ltd., Osaka, Japan) was used to measure the CC at three points of the ear leaf. Each plant was analyzed three times (error was less than 5%), after which the average value was recorded as the leaf CC. Finally, the average CC of five plants was used as the ear leaf CC of the inbred lines for the general statistical analysis of the phenotype and the GWAS. The phenotyping raw data of the CC for the 290 maize inbred lines across different environments and the best linear unbiased predictor (BLUP) values was provided in Table S7.

Data processing and analysis

Microsoft Excel 2021 was used for the general statistical analysis of the variance in the SPAD values among the ear leaves in different environments. The data were visualized using RStudio and Origin 2021 (https://www.originlab.com/2021). The corr function of R (version 4.2.2) was used to analyze the correlation between the SPAD values for different environments. The BLUP values for all materials in three environments were calculated using the MLM of lme4 in the R package [64, 65]. The BLUP values were also used for the general statistical analysis and the subsequent GWAS, which can reduce the prediction bias caused by the unbalanced data [65]. The broad-sense heritability of the SPAD value was determined using R and the following formula:

$${\varvec{H}}^{2}={\varvec{\delta }}_{\varvec{G}}^{2}/\left[{\varvec{\delta }}_{\varvec{G}}^{2}+({\varvec{\delta }}_{\varvec{G}\varvec{E}}^{2}/\varvec{n}\right)+{\varvec{\delta }}_{\varvec{e}}^{2}/\left(\varvec{n}\varvec{r}\right)]$$

where \({\delta }_{G}^{2}\) is the genotypic variance, \({\delta }_{GE}^{2}\) is the variance in the genotype-by-environment interaction, \({\delta }_{e}^{2}\) is the error variance, \(r\) is the number of replicates in an environment, and \(n\) is the number of environments.

Genome-wide association study

The genotype data obtained from the Maizego database (http://www.maizego.org/Resources.html) consisted of 1.25 million SNP (B73_RefGen_v2) that covered the whole maize genome, with a minimum allele frequency ≥ 0.05 [63]. Here, six models were implemented for GWAS, which included a single-locus model: Mixed Linear Model (MLM) and five multi-locus models, namely, Bayesian-information and Linkage-disequilibrium Iteratively Nested Keyway (BLINK), Multiple Loci Mixed Linear Model (MLMM), Fixed and random model Circulating Probability Unification (FarmCPU), Settlement of MLM Under Progressively Exclusive Relationship (SUPER) and 3 Variance-component multi-locus random-SNP-effect Mixed Linear Model (3VmrMLM). Specifically, MLM was implemented using TASSEL 5.0, whereas BLINK, MLMM, SUPER, and FarmCPU were implemented using “GAPIT” in the R package. The detected SNPs were referred to as lead SNPs. To determine whether there were false positives or negatives, Quantile–Quantile (QQ) plots for the five models (MLM, BLINK, MLMM, FarmCPU, and SUPER) were compared [66]. The LD among SNP markers was considered and the commonly used genome-wide threshold for detecting significant SNP–trait associations The suggested p-value (1.79 × 10− 6; 1/En) and − log10(p-value) ≥ 5.75 were calculated using a reported En (557,894) [67] after the quality control step.

The recently published 3VmrMLM method was implemented using the IIIVmrMLM software [34] from the GitHub website (https://github.com/YuanmingZhang65/IIIVmrMLM). The main-effect QTNs and QEIs in this model were detected using “Single_env” and “Multi_env”, with the following parameters: SearchRadius = 50; svpal = 0.01; and LOD score ≥ 3. Moreover, Manhattan and QQ plots were generated using the default parameters of “CMplot” in the R package (https://github.com/YinLiLin/R-CMplot) and 3VmrMLM.

Analyses of candidate genes

For each QTN, a 100 kb interval (50 kb upstream and downstream ot the significant SNP) was defined as a QTL, where the LD decay distance was approximately 50 kb in the association mapping panel, and the candidate genes within all QTLs were searched. We sorted and summarized the candidate genes in the corresponding QTL among the models, methods, and environments and then analyzed the homologs of these candidate genes in A. thaliana and rice (Oryza sativa) using MaizeGDB (http://www.maizegdb.org), NCBI (www.ncbi.nlm.nih.gov), RiceData (https://ricedata.cn), and Phytozome v13 (https://phytozome-next.jgi.doe.gov). The candidate genes related to CC were then functionally annotated.

The Gene Ontology (GO) analysis and the Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis of the candidate genes were performed using OmicShare tools (https://www.omicshare.com/tools) [68]. The principle is as follows: first, genes obtained from the GWAS results, which had unknown functions, were excluded from further analysis. The remaining genes were mapped to corresponding GO database terms (http://www.geneontology.org/). enabling us to assign specific GO terms to each gene. Subsequently, the number of genes associated with each GO term was computed, which facilitated the creation of a gene list and allowed us to determine the frequency of genes linked to each GO function.

To identify significantly enriched GO entries among differentially expressed genes compared to the entire genome background, we employed a hypergeometric test. This statistical test served to evaluate whether the observed frequency of genes associated with a particular GO term was significantly higher than what would be expected by chance alone. Through this analysis, we were able to pinpoint GO terms that exhibited noteworthy enrichment among the differentially expressed genes. The formula utilized for the hypergeometric test is as follows [69]:

$$P = 1 - \sum\limits_{i = 0}^{m - 1} {\left(_i^M \right)\frac{{\left(_{n - i}^{N - M}\right )}}{{\left(_n^N \right)}}}$$

where, P is p-value, represents the probability of observing i or more genes associated with a specific GO term, N is the number of genes with GO annotation in all Unigene; n is the number of differentially expressed genes in N; M is the number of genes annotated for a specific GO term in all Unigene; m is the number of differentially expressed genes annotated as a specific GO term. By applying this statistical test, we were able to identify GO entries that were significantly enriched among the differentially expressed genes, providing valuable insights into the functional implications of the observed gene expression changes.

The p-value was set to ≤ 0.05. A Gene Ontology (GO) term that satisfied this condition was defined as a GO term with significant enrichment in differentially expressed genes. Regarding the KEGG analysis [112], the formula for determining significance is similar to that of GO. The formula is as follows: N represents the total number of genes (background genes). n represents the number of differential genes (target genes). M represents the number of occurrences of a specific pathway in all genes. If the p-value is ≤ 0.05, the pathway is considered significantly enriched in differentially expressed genes.

Linkage disequilibrium analysis

The LD analysis was performed using all SNPs within the QTL containing significant QTNs. The heatmaps of the LD were constructed using LDBlockShow (Dong et al., 2021), which is available online (https://github.com/BGI-shenzhen/LDBlockShow).

Haplotype analysis

The SNP haplotype analysis was performed for the candidate genes most likely related to CC after the comprehensive analysis. We utilize significant QTN for dividing into two haplotypes based on their genotype, which detected the candidate gene. The SPAD values for the environments in which the significant QTN of the candidate genes were detected were used as the phenotypic data. And t-test was performed on the phenotypic data of the two haplotypes to compare whether there were significant differences between the two haplotypes, which were plotted using Origin 2021(https://www.originlab.com/2021).

Analysis of candidate gene expression patterns

To analyze candidate gene expression patterns, the expression data for the different samples in B73 available online (http://www.zeamap.com/) were compared. The heatmap of the FPKM-based expression levels for the key candidate genes was drawn using the Python package seaborn (https://seaborn.pydata.org/index.html).