Background

Maize (Zea mays L.), one of the most fundamental crops in the world, plays a crucial role in food, feed and industrial production. The natural population of maize shows abundant phenotypic variation and genotypic variation, offering great convenience for studying the relationship between genotype and phenotypic diversity [1]. Determining the allelic variation of important agronomic traits not only helps to analyze the genetic basis of agronomic traits, but also provides effective gene resources and molecular markers for marker assisted selection (MAS) [2].

Developments in association analysis have heightened the need for analyzing the genetic basis of complex quantitative characters [3]. Association analysis based on a natural population and linkage disequilibrium (LD) can directly identify phenotypic variation-related genes by combining the genetic variation of target traits with genetic polymorphism [4, 5]. A wide range of genetic materials can be simultaneously used to examine the associated sites and alleles of most QTL (Quantitative Trait Locus), not limited by the traditional “two-parent range”. The LD attenuates and exists within a very short distance after many reorganizations, which ensures higher location accuracy [6]. With the development of high-throughput sequencing and other biological technology, GWAS have been verified to be a useful approach for identifying genes, alleles or haplotypes related to a certain agronomic traits under complex environments, which is based on the linkage disequilibrium (LD) resulting from the association of target trait and haplotype loci. GWAS has been widely used in maize genetics, which provides many opportunities for further understanding the genetic basis for controlling the occurrence of complex quantitative characters in maize. Liu et al. (2016) identified 4 starch content related SNPs in chromosomes 1, 2, 5, and 77 starch synthesis related genes by using 263 maize inbred lines [7]. According to genome-wide association study (GWAS) based on genotyping of a natural population, a significant SNP for starch content within the ORF region of GRMZM5G852704_T01 colocalized with QTL Qsta9.1 which located in a 1.7 Mb interval on chromosome 9 [8]. Xu et al. (2018) identified 60 quantitative trait nucleotides (QTNs) for starch pasting properties through GWAS for seven pasting properties of maize starch with a panel of 230 inbred lines and 145,232 SNPs [9].

Starch is a polymeric carbohydrate consisting of numerous glucose units joined by glycosidic bonds called polymers. This polysaccharide is produced by most green plants as energy storage. Plants produce starch by first converting glucose 1-phosphate to ADP-glucose using the enzyme glucose-1-phosphate adenylyltransferase. This step requires energy in the form of ATP. The enzyme starch synthase then adds the ADP-glucose via a 1,4-alpha glycosidic bond to a growing chain of glucose residues, liberating ADP and creating amylose. The ADP-glucose is almost certainly added to the non-reducing end of the amylose polymer, as the UDP-glucose is added to the non-reducing end of glycogen during glycogen synthesis [10]. Moreover, many genes have been found to contribute to starch biosynthesis in maize, and are regulated by a complex regulation network [11].

Starch pasting properties are a critical index for measuring the quality of starch and have an important effect on the application and processing of starch. Therefore, understanding the pasting properties of starch is an important basis for its application [9]. The peak viscosity of starch is determined by the friction between starch granules after water swelling and the increase in viscosity, which reflects the expansibility of starch. The trough viscosity is due to the bursting of starch granules after the expansibility reaches its limit, reflecting the shear resistance of starch at high temperatures. The final viscosity is due to the further increase in viscosity caused by the movement of water molecules surrounding in amylose and amylopectin; this property reflects the hardness of starch at room temperature. Breakdown represents the change in the stability, reflecting the shear resistance of starch at high temperatures. Setback reflects the aging degree of starch. The pasting properties of starch are closely related to the molecular size of amylose and the branching chain length of amylopectin [12].

At present, the study of maize starch is mainly focused on the analysis and evaluation of applied quality, and traditional QTL mapping is used to locate related genes [13, 14]. However, the mapping interval is relatively large. To date, few works have performed a GWAS of starch pasting properties and discovered candidate genes. In this study, a genome-wide association study was performed based on a MaizeSNP50 BeadChip composed of 55,126 and the phenotypic data of 292 maize inbred lines. The aims of this study were to detect pasting properties related genes in maize, and to provide an important theoretical basis for maize quality breeding.

Results

Phenotypic variations analysis and genome-wide association study of quality traits

The quality traits in maize are under the control of many factors. In this study, the statistical results of the phenotype of quality traits are listed in Tables 1 and 2. In the four environments, the average protein contents were 11.52%, 11.01%, 11.87%, and 11.34%. The average starch contents were 70.46%, 71.54%, 70.24%, and 70.63%. The average oil contents were 4.54%, 4.35%, 4.35% and 4.35%. The data pertaining to each trait approximately followed a normal distribution, and the absolute values of the kurtosis and skewness among these environments were less than 1; thus, the phenotypic data were suitable for GWAS and further analysis. In the four environments, there was a significant positive correlation between protein content and oil content, except at Luoyang; starch content showed a significant negative correlation with protein content and oil content. The heritability of protein, starch and oil content were 82.73%, 85.82% and 80.69%, respectively.

Table 1 Statistical analysis of maize quality traits in different environments
Table 2 Correlation analysis of maize quality traits in different environments

In order to find the quality traits related SNPs (Single Nucleotide Polymorphism), the genotype data of the 25,331 SNPs and the phenotypic data of the 292 maize inbred lines were used for genome-wide association study. The analysis identified 26 SNPs at the P < 10–4 level, based on the FarmCPU methods (Fig. 1) [15]. In the four different environments, 8, 11 and 7 SNPs were identified to correlate to protein, starch and oil content, respectively. PZE_106054189 at Bin6.04 detected in 2015Luoyang and 2016Jiaozhou was correlated with starch content. PZE_108135907 and PZE_109032161 correlated with protein and starch content were detected at Bin8.09 and Bin9.03, respectively. PZE_105086878, PZE_106067078, SYN3414 and PZE_106054189 correlated with starch and oil content were detected at Bin5.04, Bin6.01 and Bin6.04, respectively (Table 3).

Fig. 1
figure 1

manhattan polt and Q-Q polt by genome-wide association study. A protein content; B starch content; C oil content

Table 3 The SNPs associated with quality traits (P < 10–4)

Genome-wide association study of starch content

In order to identify the SNPs related to starch content, the genome-wide association study was carried out through the phenotype data from four different environments. The result showed that 37 SNPs were related to starch content under four environments. 9 and 8 SNPs were identified in 2015 at Luoyang and Qingzhou respectively. 6 and 14 SNPs were identified at Jiaozhou in 2016 and 2017 respectively. In addition, the two SNPs, PZE_108135907 and PZE_109032161, were both detected to be related to starch content and protein content. Based on the gene annotation of MaizeGDB database [16], the identified SNPs that related to starch content were related to various metabolism pathways or signaling pathways.

Phenotypic variations analysis and genome-wide association study of starch pasting properties

The statistical results concerning the phenotype of starch pasting properties are listed in Tables 4 and 5. The data for each trait approximately follow a normal distribution, and the absolute values of the kurtosis and skewness among these environments were less than 1. In the four environments, significant positive correlations were observed between any two parameters among PV, TV, BD, FV, and SB; PT was positively correlated with PTP; and BD was negatively correlated with PT and PTP. The heritability values of PV, TV, BD, FV, SB, PT and PTP were 87.98%, 82.14%, 80.45%, 87.98%, 87.56%, 80.24% and 89.43%, respectively.

Table 4 Statistical analysis of pasting properties of maize kernels in different environments
Table 5 Correlation analysis of pasting properties in different environments

In order to find the SNPs that related to starch pasting properties, data of 25,331 SNPs and starch pasting properties were used based on the FarmCPU software. Significantly correlated SNPs were identified at the P < 10–4 level, and the candidate genes were identified (Fig. 2). A total of 48 SNPs correlated with pasting properties were detected in the four environments: 5, 7, 6, 9, 8, 8 and 5 SNPs for PV, TV, BD, FV, SB, PT and PTP, respectively. PZE_101122760, PZE_103046325, PZE_104089684, PZE_106039028, SYN26334 and PZE_110040421 were correlated with FV and SB; PZE_103091447 and PZE_105156016 were correlated with PV and TV; PZE_103096842 was correlated with PV and FV; and PZE_106067257 was correlated with TV and FV (Table 6).

Fig. 2
figure 2

the manhattan polt and Q-Q polt of pasting properties by genome-wide association study. A peak viscosity; B trough viscosity; C breakdown; D final viscosity; E setback; F peak time; G pasting temperature

Table 6 The SNPs associated with pasting properties (P < 10–4)

GO analysis of candidate genes

Based on the genome-wide association study results, 26 and 37 candidate genes were found to be related to starch content and starch pasting properties respectively (Tables 7 and 8). In order to gain insights into the functions of the identified candidate genes, Gene Ontology term enrichment analysis was performed through ShinyGO database [17]. For starch content, the annotated results were classified into two parts: biological process (16 categories) and molecular function (20 categories) (Fig. 3). The results showed that, in biological process, the fold enrichment of triglyceride biosynthetic process, neutral lipid biosynthetic process, acylglycerol biosynthetic process reach to 631, 553, 552 respectively. In addition, the diacylglycerol O-acyltransferase activity (the fold enrichment reached to 1104) was one of the most enriched categories of molecular function. For starch pasting properties, 64 biological process related categories and 18 molecular function related categories were identified (Fig. 4). In biological process, the fold enrichment of positive regulation of biological process, positive regulation of cellular process, positive regulation of cellular metabolic process, positive regulation of nitrogen compound metabolic process is 816, 900, 711, 691 respectively. Moreover, the fold enrichment of ligase activity, actin binding, identical protein binding is 642, 164, 99 respectively in molecular function.

Table 7 Information of candidate gene associated with starch content
Table 8 Information of candidate gene associated with pasting properties
Fig. 3
figure 3

Gene ontology (GO) enriched terms associated with differentially expressed genes (DEGs) in starch content

Fig. 4
figure 4

Gene ontology (GO) enriched terms associated with differentially expressed genes (DEGs) in starch pasting properties

Discussion

Starch or amylum is a polymeric carbohydrate consisting of numerous glucose units joined by glycosidic bonds [18]. This polysaccharide is produced by most plants for energy storage. In plants, the extra glucose is changed into starch which is more complex than the glucose produced by plants. Starch biosynthesis is a complex process in plants. Starch is produced by first converting glucose 1-phosphate to ADP-glucose using the enzyme glucose-1-phosphate adenylyltransferase in plant. The starch synthase then adds the ADP-glucose via a 1,4-alpha glycosidic bond to a growing chain of glucose residues, liberating ADP and creating starch. Starch content in maize kernels is a complex process [19]. In this study, the heritability of starch, protein and oil content were 85.82%, 82.73% and 80.69% respectively. It indicates the important role of genotypes in expression of traits and maize breeding. Identification of the key genes related to the variation in starch content and pasting properties can help to understand the genetic background of starch quantity and maize kernels quality and expand its application. In addition, the starch content and pasting properties SNPs we found in this study can provide some useful markers for maize marker-assisted selection.

In this study, we identified 37 SNPs and 26 candidate genes for starch content through GWAS analysis in the 292 inbred lines. In addition, 48 SNPs correlated with pasting properties were detected. The GO analysis indicated that some carbohydrate metabolism related processes, such as triglyceride, neutral, acylglycerol biosynthetic process, have an important influence on starch content. Consistent with previous studies, many carbohydrate metabolism related QTLs or genes participate in starch metabolism [20,21,22,23].

When we consider the genes identified here and previously identified QTLs or genes for starch content [7, 8, 21, 23,24,25,26,27,28], we note that the identified starch content related genes by different studies are different. This finding could be the result of differences in population size, genetic backgrounds, statistical analysis methods, environmental effects, etc. In addition, some auxin related genes were detected in this study, such as Indole-3-acetic acid amido synthetase GH3.6 [29], rho GTPase-activating protein [30], in accordance with the previous studies that auxin participates in the starch metabolism [31, 32]. These finding indicated a complex regulation network related to starch content, and the starch content could be regulated be different genes under different environments.

In order to investigate the molecular mechanism of starch pasting properties in maize, we further identified locations of associated SNPs for possible candidate genes. In this study, we identified 48 SNPs and 37 genes that correlated with starch pasting properties. According to functional annotations, these candidate genes were primarily categorized in various biological process and molecular function, such as positive regulation of cellular process, positive regulation of cellular metabolic process, positive regulation of nitrogen compound metabolic process, ligase activity, actin binding, identical protein binding etc. The transcription factors included AP2/EREBP, NAC were detected in this study. Some of the candidate genes or their homologous genes are known genes linked to carbohydrate metabolism. For example, ZmNAC34, a maize NAC transcription factor, negatively regulates starch synthesis in rice [33]. WRINKLED1 (WRI1) belongs to AP2/EREBP transcription factor. Its function in dicots for fatty acids synthesis [34].

Conclusions

Our study provides an important extension of maize starch metabolism and starch pasting properties. As a result, 26 and 37 candidate genes were found to be related to starch content and starch pasting properties respectively, indicated a complex regulatory network about regulation of starch content and starch pasting properties in maize. It also indicated that the regulatory network of starch content and starch pasting properties could be different between different environment conditions. This finding reflects the complex nature of maize starch metabolism, which depends on a large number of different environment related genes.

Materials and methods

Plant material and field design

A population composed of 292 maize inbred lines (The maize inbred line were obtained from Qingdao Agricultural University, Table 9) belonging to four subgroups (Lancaster, Lvdahonggu, P group, and Sipingtou) was used for GWAS. The 292 maize inbred lines were grown in three replications at four locations in China, 2015Qingzhou (Shandong Province, 2015QZ), 2015Luoyang (Henan Province, 2015LY) and Jiaozhou (Shandong Province) in 2016 and 2017 (2016JZ and 2017JZ). The materials were arranged in a randomized complete block design, and each inbred line was grown in a single row measuring 3 m in length and 0.6 m in width, with 15 individual plants per row. Five to eight plants in each row were self-pollinated when more than 80% silk appeared. After maturity, the ears were harvested and naturally dried. The dried ears (water content < 14%) of each plot were shelled manually and bulked for kernel composition trait tests. Pasting properties were measured using a Rapid Visco Analyzer (RVA, Model 3D, Perten, Sweden) and analyzed using Thermal Cycle for Windows (TCW) software. The sample suspension of each inbred line was incubated at 50 °C for 1 min; the temperature was increased to 95 °C, maintained for 2.5 min, and finally cooled to 50 °C and maintained for 1 min. Three primary RVA parameters, peak viscosity (PV), trough viscosity (TV), and final viscosity (FV), were obtained from the pasting curve. Two secondary RVA parameters, breakdown (BD = PV − TV) and setback (SB = FV − TV), were calculated from the primary parameters. Peak time (PT) and pasting temperature (PTP) were also recorded. Trait measurements averaged over the three replications were used as the preliminary data.

Table 9 List of 292 maize inbred lines

Analysis of phenotypic data

All analyses were performed using the statistical analysis software package IBM SPSS Statistics 20.0. The broad-sense heritability (H2) was calculated as follows: \({\mathrm{H}}^{2}={\upsigma }_{\mathrm{g}}^{2}/({\upsigma }_{\mathrm{g}}^{2}+{\upsigma }_{\mathrm{gl}}^{2}/\mathrm{n}+{\upsigma }_{\mathrm{e}}^{2}/\mathrm{nr})\), where \({\upsigma }_{\mathrm{g}}^{2}\), \({\upsigma }_{\mathrm{gl}}^{2}\) and \({\upsigma }_{\mathrm{e}}^{2}\) were estimates of genotype, genotype environment interaction and experimental error variances, while n and r were the numbers of environments and replications, respectively [35].

DNA Extraction and SNP Genotyping

DNA for SNP genotyping was extracted from a seeding of each line by the CTAB method [36]. A total of 55,126 SNPs were selected from the whole maize genome and genotyped with the MaizeSNP50 BeadChip from Pioneer DuPont (U.S). The 25,331 SNPs remaining after excluding SNPs with a missing rate > 20%, heterozygosity > 10% and minor allele frequency (MAF) < 0.05 were used for GWAS.

Association analysis

The SNPs from 292 inbred lines were analyzed with the FarmCPU (Fixed and Random Model Circulating Probability Unification), which used a Fixed Effect Model (FEM) and a Random Effect Model (REM) alternately. The source code of the algorithm (http://zzlab.net/FarmCPU/FarmCPU_functions.txt) was invoked through the R software GAPIT package [Zhu et al. 2018]. The population structure was assessed with unlinked markers (r2 = 0.1) using STRUCTU RE ver. 2.3.4 [37], based on the highest delta K value representing genetic clusters [38].

Candidate genes analysis

Based on the results, SNPs associated with starch pasting properties were identified. In this study, the genome from maize line B73 was used as the reference genome for candidate gene analysis [39, 40]. The genes’ p ‘’ositions and functions were annotated according to MaizeGDB database (http://www.maizegdb.org/)(references) and NCBI database (http://www.ncbi.nlm.nih.gov/)(references). The ShinyGO database (http://bioinformatics.sdstate.edu/go/) was used to GO analysis of the candidate genes [17].