Association analysis of candidate genes for maysin and chlorogenic acid accumulation in maize silks
- First Online:
- Cite this article as:
- Szalma, S.J., Buckler, E.S., Snook, M.E. et al. Theor Appl Genet (2005) 110: 1324. doi:10.1007/s00122-005-1973-0
- 477 Views
Two compounds, the C-glycosyl flavone maysin and the phenylpropanoid product chlorogenic acid (CGA), have been implicated in corn earworm (Helicoverpa zea Boddie) resistance in maize (Zea mays L.). Previous quantitative trait locus (QTL) analyses identified the pericarp color (p) locus, which encodes a transcription factor, as the major QTL for maysin and CGA. QTL analysis has also implicated the dihydroflavanol reductase (DFR; E.C. no. 18.104.22.168) locus anthocyaninless1 (a1) and the duplicate chalcone synthase (CHS; E.C. no. 22.214.171.124) loci colorless2 (c2) and white pollen1 (whp1) as genes underlying QTL for maysin and/or CGA synthesis. Epistatic interactions between p and a1 and between p and c2 were also defined. CHS catalyzes the first step in the flavonoid pathway and represents one of the first enzyme steps following the branch off the general phenylpropanoid pathway towards CGA synthesis. In maize, the reduction of dihydroflavanol to leucoanthocyanin by DFR immediately follows the pathway branch leading to C-glycosyl flavone production. The detection of QTLs for maysin and CGA concentration at loci encoding enzyme steps following the pathway branch points implicates alterations in the flow of biochemical intermediates as the biological basis of the QTL effects. To examine if sequence variation among alleles of a1, c2, and whp1 affect maysin and CGA synthesis in maize silks, we performed an association analysis. Because the p locus has often been a major QTL for maysin and CGA and has exhibited epistatic interactions with a1, c2, and whp1, association analysis was conditioned on the p genotype. A highly significant association of two sequence polymorphisms in the promoter of a1 with maysin synthesis was demonstrated. Additional conditioning on the genotype of the significant a1 polymorphism allowed the detection of a significant polymorphism within the whp1 promoter. Our analyses demonstrate that conditioning for epistatic factors greatly increases the power of association testing.
The phenylpropanoid and flavonoid pathways have been widely studied in plants, including maize (Zea mays L.) (Coe et al. 1988; Harbourne 1988; Grotewold et al. 1998; McMullen et al. 1998), Arabidopsis thaliana (Burbulis and Winkel-Shirley 1999; Xie et al. 2003), snapdragon (Antirrhinum majus) (Moyano et al. 1996; Tamagnone et al. 1998; Schwarz-Sommer et al. 2003), and the Solanaceae (Kroon et al. 1994; De Jong et al. 2003). A wide array of regulatory and structural genes of flavonoid biosynthesis have been cloned in these species. The phenylpropanoid compound chlorogenic acid (CGA) and the C-glycosyl flavone maysin have been implicated in corn earworm (Helicoverpa zea Boddie) antibiosis (Waiss et al. 1979; Elliger et al. 1980; Isman and Duffy 1983). By determining the genetic basis of the variation in the accumulation of maysin and CGA in maize silks, we advance both our understanding of the flavonoid pathway as a model genetic system and promote crop improvement.
Quantitative trait locus (QTL) analysis has been employed using populations constructed to maximize phenotypic variance for C-glycosyl flavone and/or CGA accumulation (Byrne et al. 1996b, 1998; Lee et al. 1998; McMullen et al. 1998; Bushman et al. 2002). One striking discovery was that regulatory loci explained a tremendous amount of phenotypic variation. A region on chromosome 1 containing the pericarp color (p) locus was identified in several populations as the QTL explaining the largest amount of phenotypic variation for both maysin and CGA accumulation (McMullen et al. 1998, 2001a). The p locus encodes the duplicate myb-like transcription factors p1 and p2 (Zhang et al. 2000). Additional support for the effect of p on flavone and phenylpropanoid accumulation has come from transformation studies (Grotewold et al. 1998; Dong et al. 2001). Any investigation of the expression of the structural genes within this pathway requires careful monitoring of the p genotype.
The role that genetic variation of structural genes in the flavonoid pathway has on maysin and CGA synthesis remains a question. Byrne et al. (1998) reported that the whp1 region was a QTL for maysin and corn earworm antibiosis. Szalma et al. (2002) demonstrated that increased dosage of functional alleles at c2 and whp1 had a positive effect on maysin accumulation and a negative effect on CGA, presumably by modulating substrate flow through alternative pathways. McMullen et al. (2001b) also showed a similar genetic effect for the interaction of the flavone and 3-deoxyanthocyanin pathways, as the elimination of DFR function by a mutation at a1 resulted in increased maysin at the expense of 3-deoxyanthocyanins, in a p-dependent manner. The BANYULS gene in Arabidopsis thaliana and an equivalent gene in Medicago truncatula, an anthocyanidin reductase involved in the production of condensed tannins, is also a locus at a branch point of the flavonoid pathway, and its expression was shown to alter the relative amounts of anthocyanins and condensed tannins (Xie et al. 2003). Although a1, whp1, and c2 can all be made into QTLs by the use of non-functional alleles, we have never seen QTLs of comparable magnitude for these loci in QTL experiments with standard inbred lines. Consequently, the question remains—does the natural sequence variation present at c2, whp1, and a1 in diverse maize lines affect maysin and/or CGA synthesis?
Association mapping, also referred to as linkage disequilibrium (LD) mapping, is a method that can test the relationship of specific sequence polymorphisms in candidate genes to phenotypic variation (Thornsberry et al. 2001). The non-random association of polymorphisms within a locus that defines alleles in a population constitutes LD (Flint-Garcia et al. 2003). The rate of LD decay in maize is rapid, generally within 1,000–1,500 bp (Remington et al. 2001; Tenaillon et al. 2001). Therefore, this within-gene resolution of association mapping represents a much greater precision than QTL mapping and provides an independent approach to test candidate genes identified in standard QTL experiments. In addition, because association analysis links specific nucleotide polymorphisms to trait variation, one may often hypothesize specific biological effects for significant polymorphisms. A potential obstacle in association studies is the spurious association of polymorphisms with traits due to relatedness rather than sequence function. This is especially true in maize because lines have varying degrees of shared pedigree histories. False positives due to population structure can be reduced by including in the association model a vector quantity for sub-population membership derived from simple sequence repeat (SSR) information (Pritchard and Rosenberg 1999; Pritchard et al. 2000a, 2000b; Thornsberry et al. 2001). For this study investigating the natural allelic variation at a1, c2, and whp1, the association model was further enhanced by the inclusion of genotype-class variables for epistatic factors.
The sequence variation present in the a1 promoter and the c2 and whp1 loci was characterized. After determining the constitution of, and correcting for the effect of the various p alleles, association tests were performed to examine the correlation between discrete sequence polymorphisms in c2, whp1, and a1, with maysin and CGA accumulation.
Materials and methods
The 86 maize inbred lines used in this study represent a broad spectrum of the available maize diversity (Remington et al. 2001). On the basis of SSR marker information, this collection of lines (see electronic supplemental material, ESM-S1) has previously been divided into three sub-populations: Stiff Stalk (SS), non-Stiff Stalk (NSS), and sub-tropical/tropical (ST) (Thornsberry et al. 2001). Only this set of lines with sub-population definition was used in structured association tests involving c2, whp1, and a1 with maysin and CGA. Twenty-five near-isogenic lines containing different alleles of p in the 4Co63 background (Brink and Styles 1966) (ESM-S2) and six additional lines utilized by our group in previous QTL studies (ESM-S3) were included in experiments to define the relationship of p gene structure with phenotype. Plants were grown during the summer of 2001 at the University of Missouri Genetics Research Farm, Columbia, Mo., USA. Two replications of all lines were grown in a common field to increase the sample size and ensure the collection of multiple silk samples from each line. Leaf tissue for DNA extraction was collected from whorl leaves 1 month after planting.
Silk collection and analysis
Primary ear shoots were covered prior to silk emergence to prevent pollination. Silks were collected 2 days post-emergence from five plants of each line in each replication for an average of nine silks per line (low=5 silks, high=10 silks). Silk browning (Byrne et al. 1996a), pericarp pigmentation, and cob coloration, which are under the influence of the p locus, were noted when applicable (ESM-S1). Chemical analysis of the silks was performed using reversed-phase high-performance liquid chromatography on individual samples following extraction of the silks in methanol at 0°C for 14 days (Snook et al. 1989, 1993). The concentrations of maysin and CGA were determined and expressed as the percentage of fresh silk weight for each silk mass.
Molecular structure classification of the p allele
PCR primers EP5-8, EP3-13, and P2-5 were utilized to amplify either p2 specifically or both p1 and p2 (Zhang et al. 2000). The combination of primers EP5-8 and EP3-13 yields an approximately 380-bp fragment for p1 and an approximately 300-bp fragment for p2. The combination of the P2-5 and EP3-13 primers yields an approximately 240-bp fragment for p2 and no fragment for p1. Using these two sets of primers, we classified lines as ‘p1 only’, ‘p2 only’, or ‘p1 and p2’. A fourth, novel amplification pattern was seen with many p-www alleles; no amplification with P2-5 and EP3-13 and a single, large (>2 kb) fragment produced with the EP5-8 and EP3-13. This pattern was designated as the ‘A619 type’.
DNA extraction and sequencing
Extraction, purification, and quantification of DNA were performed according to methods published in the University of Missouri-Columbia RFLP Procedures Manual (http://www.maizemap.org, confirmed August, 2004). Primers specific to regions of the genes to be sequenced were designed using primer3 software (http://frodo.wi.mit.edu/cgi-bin/primer3/primer3.cgi/, confirmed August, 2004). The fragments to be sequenced ranged between 500 bp and 1,400 bp. To ensure specificity in amplification between the duplicate CHS loci, we anchored at least one primer for each fragment in an intron. The quality of the amplification products was checked on 1% (w/v) agarose gels. Generally, amplification conditions for the sequencing templates followed those described at http://www.maizegdb.org/ssr.php (confirmed August, 2004), or they were modified to extension times of 3 min.
The precipitation of PCR products was performed prior to and following the sequencing reactions and consisted of 1 volume 2 mM MgCl2 and 2.5 volumes 95% (v/v) EtOH and a 15- to 30-min incubation followed by centrifugation at 1,924 g for 30 min. The pellets were washed with 5 volumes 70% (v/v) EtOH for 10 min and centrifuged for 15 min at 1,924 g. All stages of precipitation and the reagents were at room temperature. Sequencing reactions were performed with the dRhodamine Terminator Ready Reaction Sequencing kit according to manufacturer’s recommendations [Applied Biosystems (ABI), Foster City, Calif.]. The samples were denatured in 10 μl deionized formamide at 95°C for 4 min and then cooled to 4°C before analysis on the ABI 3100 sequencer. Two or three replications of sequencing were performed in both forward and reverse directions for each fragment to provide a twofold to sixfold redundancy for each fragment in each line. Sequence quality was assessed manually with the seqman application of the dnastar software suite (DNASTAR, Madison, Wis.) and with the assistance of phred software (Ewing and Green 1998; Ewing et al. 1998). Prior GenBank sequence information was used as a framework upon which to assemble data (c2, no. X60205; whp1, no. X60204; a1, no. X05068). Contiguous fragments were assembled manually with seqman or phrap (Phil Green, University of Washington, 2000). Several large insertions in the intron of c2 relative to the reported sequence hindered assembly in some lines. The sequence of the c2-Idf allele, provided by C. Della-Vedova, University of Missouri (personal communication), proved useful in completing sequence assembly. Initial sequence alignments were obtained with clustalx for the Microsoft Windows platform and the clustalw protocol (Thompson et al. 1994) within the megalign application of the dnastar software suite. Alignments were edited manually with sequence quality scores provided by phred and by visual examination of the original trace files for each line. The a1 promoter was sequenced in 86 lines (ESM-S1) for the region (1,358–1,761 bp) of the reported sequence (no. X05068). The promoter and coding regions of whp1 corresponding to the regions 29–908 of no. X60204 were sequenced in 77 lines, and the second exon (3,007–3,810 bp) was obtained in 64 lines (ESM-S1). A total of 3.4 kb of sequence was obtained for the c2 locus corresponding to base pairs 331–3,778 of the GenBank accession (no. X60205) in 84 lines (ESM-S1). The sequences for the a1, c2, and whp1 alleles determined in this study have been deposited in GenBank with the accession numbers (c2, AY728808-AY728890; a1, AY730781-AY730865; whp1, AY731288-AY731363).
The sas for windows ver. 8.2 (SAS Institute, Cary, N.C.) was used to conduct the analysis of variance (anova) for the preliminary statistical analysis of trait data. The tassel software package (http://www.maizegenetics.net/bioinformatics/tasselindex.htm, confirmed August, 2004) was used to identify single nucleotide polymorphisms (SNP) and insertions/deletions (indels). Polymorphisms were tested by association analysis if high-quality sequences were available for at least 90% of the lines sequenced for that gene and if there was an allele frequency of at least 10%. Tests for LD and associations of polymorphisms with trait values were also conducted with tassel. Mean maysin and CGA contents of each line were used in the association analyses. Association analyses were conditioned on population structure estimates from SSR data, the functional versus non-functional classification of the p region, and significant polymorphisms at structural loci (Thornsberry et al. 2001). The test statistic has been described by Thornsberry et al. (2001). To select an appropriate significance threshold, data were permuted 1,000 times. A polymorphic site was deemed to have a significant association if the logistic regression P-value was below the 5% empirically derived value.
Association analysis implements likelihood estimation, thereby incorporating more information into the statistical model through estimates of conditional posterior probability and, consequently, resulting in increased sensitivity of the test (Kao 2000). proc glm in sas allows for computationally straightforward detection of significant differences in phenotypic variation relative to genotype classes using an anova approach, although differences in the estimation of parameters and potentially biased estimation of residual variance exist relative to the likelihood approach (Kao 2000). Under appropriate conditions, the anova and likelihood approaches should provide similar results, making it possible to use anova to confirm the results of association analysis. Indels and SNPs identified in the a1 and whp1 sequences were tested with proc glm in sas. Since a priori knowledge existed that the p region and genetic background of the lines used in this study accounted for significant differences in silk maysin concentrations, genetic background classification (SS, NSS, or ST) and functional versus non-functional classification of the p region were used as class variables when testing the significance of the polymorphism genotypes of a1 and whp1. A multiple-locus model for maysin accumulation in maize silks was created with proc glm in sas testing the functional versus non-functional classification of the p region, the genetic background, the most significant sequence polymorphisms from a1 and whp1 [a1, position 1,369(A/G); whp1, position 261(13/15)], and interaction terms. Loci and interaction terms satisfying the selection criterion of a Type-III sum of squares (P<0.01) were retained in the model along with the main-effect components of the significant interaction terms.
Silk maysin and CGA concentrations
From anova, neither individual plants within lines nor lines between replications were significant for maysin and/or CGA concentration. Therefore, data from replications were combined to provide line means for association tests. Mean maysin values ranged from 0.00 to 0.87% (±0.019%, experiment-wide) of fresh silk weight, and mean CGA concentrations ranged from 0.00 to 0.18% (±0.007%, experiment-wide) of fresh silk weight (ESM-S1).
In the a1 promoter, 11 SNPs and five indels with a frequency of at least 10% were identified. Eighty-five SNPs and 92 indels were identified throughout the c2 sequence. Thirty-five SNPs and 18 indels were identified in the whp1 promoter, 3′-untranslated region, and first exon, and two indels and 19 SNPs were found in the second exon of whp1. In addition, the T nucleotide at a rare T/G SNP at position 799 in the whp1 sequence would result in a premature stop codon within the first intron. K55, the source of the original mutant (non-functional) whp1 allele (Coe et al. 1981), was one of five lines with the T base, suggesting this site as the cause of the whp1-ref mutant allele. Unfortunately, the sample size of only five lines harboring the position-799 T SNP was too small to test this polymorphism in structured association analysis.
Structured association testing
Pericarp color (p)
Functional alleles at p induce the expression of genes within the phenylpropanoid and flavone pathways to allow the production of maysin and CGA, while non-functional alleles result in the accumulation of lower quantities of CGA and negligible amounts of maysin (Byrne et al. 1996b; Bushman et al. 2002; Szalma et al. 2002). Both a1 and c2 have been previously demonstrated to be under p regulation (Dooner et al. 1980; Grotewold et al. 1994). Therefore, it was necessary to account for the effect of specific p alleles when testing polymorphisms in a1, c2, and whp1.
The relationship of molecular pericarp color (p) classification to the visual p phenotypea(n=76)
p1 and p2
A strong relationship can be seen to exist between p classification and SSR-based population structure (ESM-S1). Lines classified as SS were primarily of the ‘p1 only’ class, and the ST lines were mostly ‘p2 only’. The alleles ‘p1 only’ and ‘p2 only’ were approximately equally represented within the NSS subpopulation. Therefore, SSR population structure largely accounts for differences among the functional p classes within the SS and ST groups. Based on these results, structured association analysis was performed with SSR-based population structure estimates and the classification of functional versus non-functional p alleles.
Significant sites identified with structured association analysis
white pollen1 (whp1)
Population structure estimates from SSR data and corrections for the p and a1 1,369(A/G) genotypes were used in structured association tests of whp1 polymorphisms. One significant association was detected between a 13-bp and a 15-bp divergent sequence and maysin accumulation at position 261 [whp1 261(13/15)] (Table 2, Fig. 3). Lines with the whp1 261(13) allele synthesized 0.225% of the fresh silk weight of maysin versus 0.142% in lines with the whp1 261(15) allele. This site was only significant in tassel after conditioning the association tests on both the p-allele class and the a1 1,369(A/G) polymorphism. No significant associations were detected between polymorphisms of the second exon of whp1 and either silk maysin or CGA accumulation.
Structured association tests were conducted with and without correction for the p genotype. No significant association was found between any polymorphism of c2 and either the maysin or CGA phenotype. Additional correction for the genotype of the a1 1,369(A/G) SNP did not alter this result.
Model for maysin synthesis
Multiple locus model of maysin synthesis. Total R2 of model = 0.360
Pr > F
p* a1 1369(A/G)f
Epistatic interactions in association testing
The multiple-locus model for maysin synthesis provides further support for the significance of the a1 1,396(A/G) and whp1 261(13/16) polymorphisms and for the epistatic interaction of p and a1 (Table 3). The importance of conditioning the association analysis by genetic background is verified by the significance of the interaction terms of background with whp1, and with p classification. In our study, a large portion of the “relatedness” effect in background is due to the non-random distribution of specific functional p alleles among the three maize subpopulations (ESM-S1). The retention of significant interaction terms between individual loci and genetic background in the multiple-locus model suggests the presence of other epistatic factors, in addition to p classification, for maysin synthesis. Presumably, epistatic interactions play a major role in explaining the background effect seen in testing other candidate genes and traits.
Biological mechanisms of polymorphism effects
The two significant sites within a1 were in substantial LD with one another, making it difficult to separate an individual polymorphism’s effect on phenotype. Because no significant difference was detected between mean maysin concentrations of the a1 1,369(G)/a1 1,600(−) and the a1 1,369(G)/a1 1,600(+) combinations of alleles, the a1 1,369(A/G) polymorphism is likely the more important polymorphism affecting maysin accumulation. However, the low number of lines in this comparison limits the strength of this conclusion. Both significant polymorphic sites are in positions identified as potentially affecting transcription (Schwarz-Sommer et al. 1987). The a1 1,369(A/G) SNP is in a proposed CAAT box (1,367–1,370 bp) within the upstream a1 promoter (1,367–1,734 bp). Individuals with the a1 1,369(A) allele at this position synthesized more maysin on average than individuals with a1 1,369(G), therefore disruption of the CAAT box at 1,369 bp results in a significant decrease in silk maysin. In the context of our current model of flavone biosynthesis, a decreased function at a1 should result in increased maysin accumulation (Fig. 1) (McMullen et al. 2001b). The initial methionine residue in the DFR protein is coded for by the ATG sequence at position 1,843. Several ATGs that could be used as start sites (positions 1,563, 1,631, and 1,667) exist between the upstream promoter and the start site at position 1,843. Translation initiation from any ATG site prior to position 1,843 would result in the early termination of protein synthesis. We propose that the a1 1,369(G) polymorphism weakens the upstream promoter, thereby enhancing transcription initiation at the downstream promoter immediately before the ATG for the start of the DFR protein.
The deletion at a1 1,600(+/−), resulting in reduced maysin accumulation, includes an 8-bp section of the a1 promoter identified as an enhancer (Schwarz-Sommer 1987). Again, disruption of a sequence element postulated to support a1 transcription results in enhanced expression of maysin from an alternative pathway. Tuerck and Fromm (1994) performed a partial deletion analysis of the a1 promoter, however they did not analyze the sequence upstream of position 1,620. All regulatory elements proposed within the region they investigated are conserved throughout all the lines in this study.
Our data suggest that the natural genetic variation within the a1 promoter results in phenotypic variation for maysin synthesis. We favor the hypothesis in which the change of an A to a G in an upstream CAAT box enhances transcription initiation at the downstream regulatory sequence. The increased transcription of a1 would promote the synthesis of 3-deoxyanthocyanins and anthocyanins and decrease maysin synthesis (McMullen et al. 2001b). We cannot rule out the possible additional involvement of the enhancer element deletion. Additionally, we recognize that we cannot rule out potential alternative linked sites located in flanking regions not sequenced in this study. An examination of relative transcript abundances between the promoter types is needed to determine if our hypotheses are true. Our results support earlier findings from QTL studies that a1 is a QTL for maysin accumulation.
Chalcone synthase loci
No significant associations were detected between c2 and either maysin or CGA concentration. One significant association was detected at the P<0.05 level in the whp1 promoter region for maysin accumulation when association tests were conditioned on p and the a1 1,369(A/G) polymorphism. The significant polymorphism detected in the whp1 promoter began at 261 bp and continued through either 273 bp or 275 bp depending on whether the whp1 261(13) or whp1 261(15) allele was present, respectively. No important regions of the whp1 promoter before 289 bp, where the first CAAT signal is reported, are identified in the literature (Franken et al. 1991), and the biological basis for the effect of this whp1 polymorphism is unknown.
The CHS function is required for the production of viable pollen. The flavonoids quercetin (3,4-dihydroxyflavonol) and kaempferol are needed in either pollen or silk for pollen-tube function. The lack of functional alleles at both c2 and whp1 results in conditional male fertility in maize (Mo et al. 1992; Pollak et al. 1995). Functional diversity in c2 was not observed, and only a single significant polymorphism in whp1 was detected. Functional diversity within a candidate gene may be reduced due to natural and human selection against deleterious effects of polymorphisms and can only accumulate to the level allowed by the most essential process affected by that gene.
These observations help explain why p, rather than the CHS loci, is commonly the major QTL for maysin. The p gene is the major QTL for maysin synthesis not only because it controls expression of the pathway, but also because allelic variation is generally not selected against in modern maize inbreds, as it controls non-essential pathways. Therefore, non-functional alleles of p, along with an extensive array of alleles expressing tissue-specific variation, can be maintained in inbred maize germplasm. The expression of CHS loci c2 and whp1 is a required function that can occur independently of p for the synthesis of the flavonols and is maintained by selection.
Prior QTL analyses have demonstrated that a1, c2, and whp1 are all candidate genes for QTLs for maysin synthesis. Analysis of these loci by association analysis is complicated by the fact that the major genetic factor for maysin synthesis is the genotype of the p locus. All of the lines in this study were classified based on p-allele genotype, the effect of the different p-allele classes on maysin synthesis determined, and association analysis conditioned on both population structure and the p-allele classification. Two sequence polymorphisms in the a1 promoter were significant for maysin synthesis. Both polymorphisms alter sequences implicated in transcription regulation. By including an a1 polymorphism as an additional factor in association analysis, a significant sequence polymorphism was detected in whp1. These results expand the application of association techniques beyond single-locus analysis and emphasize the need for understanding the relative roles of loci with major effects on trait expression. Association analysis can be used to test secondary QTL if, and probably often only if, the primary QTL(s) for the trait is/are known and are considered in the analysis model.
The authors would like to thank Katherine Houchins and Chris Browne for technical assistance and Sherry Flint-Garcia and Jim Holland for reviewing the manuscript and for helpful suggestions. This research was supported by USDA-National Research Initiative, Plant Genome Grant no. 2001-35301-10581 and funds provided by USDA-Agricultural Research Service. SJS was supported by a University of Missouri-Molecular Biology Program Predoctoral Fellowship. The names of products are necessary to report factually on available data; however, neither the USDA nor the University of Missouri guarantees or warrants the standard of the product, and the use of the name does not imply approval of the product to the exclusion of others that may also be suitable.