Association analysis of physicochemical traits on eating quality in rice (Oryza sativa L.)
- First Online:
- Cite this article as:
- Zhao, W., Chung, J., Kwon, S. et al. Euphytica (2013) 191: 9. doi:10.1007/s10681-012-0820-z
Improvement of rice eating quality is an important objective in current breeding programs. In this study, 130 rice accessions of diverse origin were genotyped using 170 SSR markers to identify marker–trait associations with physicochemical traits on eating quality. Analysis of population structure revealed four subgroups in the population. Linkage disequilibrium (LD) patterns and distributions are of fundamental importance for genome-wide mapping associations. The mean r2 value for all intrachromosomal loci pairs was 0.0940. LD between linked markers decreased with distance. Marker–trait associations were investigated using the unified mixed-model approach, considering both population structure (Q) and kinship (K). In total, 101 marker–trait associations (p < 0.05) were identified using 52 different SSR markers covering 12 chromosomes. The results suggest that association mapping in rice is a viable alternative to quantitative trait loci mapping, and detection of new marker–trait associations associated with rice eating quality will also provide important information for marker-assisted breeding and functional analysis of rice grain quality.
KeywordsRiceAssociation mapping (AM)Linkage disequilibrium (LD)Population structureEating quality
Rice is the staple food for about half of the world’s population. Improvement of rice quality is among the most important aims in current breeding programs, especially eating and cooking quality because most rice is consumed cooked. Rice quality is a complex trait consisting of many components such as milling, appearance, nutrition, cooking and eating qualities. Among these qualitative properties, consumers pay more attention to the fine appearance and high eating quality (Huang et al. 1998; Wan et al. 2004). Rice eating quality has usually been evaluated by three major physicochemical characteristics of the starch as indirect indexes, namely, amylose content (AC) (Juliano 1985), gel consistency (GC) (Cagampang et al. 1973), and gelatinization temperature (GT) (Little et al. 1958). GT is a physical trait responsible for rice cooking time and the capacity to absorb water during the cooking processes, while the GC is for softness, and the AC is for texture and appearance. Hence, regulating AC in rice has been a major concern of rice breeders.
To facilitate the development of new varieties with high eating qualities, it is necessary to understand the genetic basis of such traits. So far, most research on cooking and eating quality has focused on AC, GC, and GT, and a few attempts have been made to identify quantitative trait loci (QTLs) associated with eating quality in rice using traditional linkage-mapping methods, of which major QTLs for AC and GC were allelic or tightly linked to the Waxy (Wx) gene on chromosome 6 (Lanceras et al. 2000), and major QTLs for GT were allelic or closely linked to the alk gene on chromosome 6 (He et al. 1999; Lanceras et al. 2000). Furthermore, it was also found that AC, GC, and GT were controlled by the linked genomic region near the Wx locus (Tan et al. 1999). Umemoto et al. (2002) confirmed this and demonstrated that the alk locus encodes the enzyme soluble starch synthase IIa. Bao et al. (2002) also found the effect of the wx region on GC. However, He et al. (1999) and Bao et al. (2002) showed that GC is controlled by two QTLs with minor effects. Li et al. (2004) identified four QTLs for AC, three for GT and five for GC using backcross-inbred lines. The four QTLs for AC were located on chromosomes three, four, five and six. The QTL on chromosome 6 covered the wx gene region and mainly contributed to the variance between japonica and indica varieties (Li et al. 2004). Fan et al. (2005) detected a total of 12 main-effect QTLs for the three traits, with a QTL corresponding to the wx locus showing a major effect on AC and GC, and a QTL corresponding to the alk locus having a major effect on GT. Sun et al. (2006) found seven different QTLs including two for AC, three for GT, two for GC at six chromosomal regions controlling complex traits related to rice grain quality. Wada et al. (2006) mapped four QTLs for AC. Amarawathi et al. (2008) mapped QTLs related to AC, GC and GT on seven different chromosomes. Lestari et al. (2009) reported 18 DNA markers associated with eating quality of temperate japonica rice.
Although traditional QTL mapping will continue to be an important tool in gene tagging of crops, overall, it is very costly (Hansen et al. 2001; Stella and Boettcher 2004; Gupta et al. 2005; Stich et al. 2006) and has low resolution for simultaneous evaluation of only a few alleles over a long research timescale (Cardon and Bell 2001; Flint-Garcia et al. 2003; Stich et al. 2005; Roy et al. 2006). These limitations, however, can be reduced with the use of “association mapping” (Ross-Ibarra et al. 2007). This methodology has become popular in human genetics as association mapping (AM) or linkage disequilibrium (LD) mapping, and has led to many successes. The basic objective of AM studies is to detect correlations between genotypes and phenotypes in a sample of individuals on the basis of LD (Zondervan and Cardon 2004). In plant genetics, AM offers the important advantage of sampling unrelated individuals in a population, as compared to other experimental designs that require sampling within families (Risch 2000), producing a broader genetic variation in a more representative genetic background. Of particular interest to rice breeders is the possibility of using existing germplasm resources for gene and allele discovery on the basis of AM strategies (Farnir et al. 2000). Moreover, no need exists to develop expensive and tedious biparental populations, which makes the AM approach time-saving and cost-effective. In plant, the model species Arabidopsis provided some of the first examples of AM in plants through identification of previously cloned flowering time genes (Aranzana et al. 2005), and now the AM has been applied successfully to many plant species, such as maize (Remington et al. 2001), rice (Agrama et al. 2007), wheat (Breseghello and Sorrells 2006; Tommasini et al. 2007), barley (Kraakman et al. 2006), oilseed rape (Hasan et al. 2008), Brassica rapa (Zhao et al. 2005), soybean (Jun et al. 2008) and G. hirsutum (Abdurakhmonov et al. 2008). Therefore, the objective of the present study was to use a large collection of rice accessions to determine the utility of population structure analysis and LD, and identify associated markers with physicochemical traits involved in rice eating quality using the AM approach.
Materials and methods
A core set of 166 rice accessions was previously developed from 4,406 worldwide varieties collected from the National GeneBank of the Rural Development Administration (RDA-GeneBank), Republic of Korea, using PowerCore program (Kim et al. 2007). Based on the available SSR genotyping data after removing those accessions with >25 missing data, 130 rice accessions from 28 different countries were finally selected for the marker–trait association studies (Supplemental Table 1). Plant young leaves were sampled and stored at −80 °C until genomic DNA extraction using CTAB method.
Evaluation of physicochemical traits for eating quality
Correlation coefficients among traits in 130 rice accessions
In total, 170 SSR markers located on the 12 chromosomes were selected for population structure analysis and AM (Supplemental Table 2). All SSR markers were obtained from GRAMENE (http://www.gramene.org/). A three-primer system (Schuelke 2000) was used that included a universal M13 oligonucleotide (TGTAAAACGACGGCCAGT) labeled with one of three fluorescent dyes (6-FAM, NED, or HEX) that allowed PCR products to be triplexed during electrophoresis, a special forward primer composed of the concatenation of the M13 oligonucleotide, and the normal reverse primer for SSR PCR amplification. Information on primer sequences and PCR amplification conditions for each set of primers is available at http://www.gramene.org/. SSR alleles were resolved on an ABI Prism 3100 DNA sequencer (Applied Biosystems, Foster City, CA, USA) using GENESCAN 3.7 software, and sized precisely using GeneScan 500 ROX (6-carbon-X-rhodamine) molecular size standards (35–500 bp) with GENOTYPER 3.7 software (Applied Biosystems).
Allelic diversity and population structural analysis
The number of alleles, gene diversity (GD), and polymorphism information content (PIC) per locus were calculated with the PowerMarker 3.25 program (Liu and Muse 2005). A tree using the unweighted pair group method with arithmetic mean (UPGMA) based on the genetic distance matrix was constructed using the MEGA 4.0 program (Tamura et al. 2007). The AM population was analyzed for possible population structure with the model-based Structure 2.2.3 program (Pritchard et al. 2000; Falush et al. 2003) for the 130 rice accessions using a burn-in of 50,000, run length of 100,000, and a model allowing for admixture and correlated allele frequencies. Five runs of Structure were performed by setting the number of populations (K) from two to nine, and an average likelihood value, L(K), across all runs was calculated for each K. The model choice criterion to detect the most probable value of K was ΔK, an ad hoc quantity related to the second-order change of the log probability of data with respect to the number of clusters inferred by Structure (Evanno et al. 2005). The highest ΔK of the data was observed for K = 4 clusters of plants. Therefore, the Q matrix as the average of five runs for K = 4 was calculated. The correlation of alleles within subpopulations, FST, was calculated using analysis of molecular variance (AMOVA) with the program Arlequin 3.11 (Excoffier et al. 2005).
Linkage disequilibrium decay
LD values (r2) between SSR loci on the same chromosome were calculated using the TASSEL program 2.2.3 (http://www.maizegenetics.net), with the rapid permutation test in 10,000 shuffles (Churchill and Doerge 1994). For multiple alleles, a weighted average of r2 (squared allele frequency correlation) between each locus pair was calculated (Farnir et al. 2000), except for 7 microsatellite loci (not mapped or unknown genetic distance). The extent of LD was estimated separately for loci on the same chromosome. Only alleles with a frequency equal to or greater than 0.05 were considered for LD calculation (Thornsberry et al. 2001). The pairs of loci were considered to have a significant LD if p < 0.01. Significance thresholds corrected for multiple testing within chromosomes were approximately proportional to the reciprocal of the number of markers tested in each chromosome. The estimated genetic distance (cM) between loci was inferred from http://www.gramene.org.
The hypothesis of the association of SSR markers with eating quality parameters in the presence of population structure was tested using a mixed linear model (MLM), as described by Yu et al. (2006), in the program TASSEL 2.0.1 (http://www.maizegenetics.net/). This recently developed unified mixed-model method simultaneously takes into account multiple levels of both gross population structure (Q) and finer-scale relative kinship (K). The population structure matrix (Q) was identified by running the Structure program at K = 4. The relative kinship matrix (K matrix) was obtained using the program SPAGeDi (Hardy and Vekemans 2002). Output from SPAGeDi was formatted to a text file readable in TASSEL. All SSR markers were used for marker–trait associations. Only markers with an allele frequency of 5 % or more were included in the association analysis. To account for type I error bias, p values were adjusted for multiple tests using the procedure proposed by Whitt and Buckler (2003) based on permuted p-values of random markers. The p-value determines whether a trait was associated with a marker, and the r2 marker evaluates the magnitude of the QTL effects. MapChart 2.2 was used to draw the map (Voorrips 2002).
Descriptive statistics of the traits for eating quality in 130 rice accessions
Genetic diversity and population structure
Analysis of molecular variance (AMOVA) for the model-based four clusters of rice accessions
Source of variation
Sum of squares
Percentage of variation
Model-based population pair-wise FST between four clusters
Level of LD among intra-chromosomal SSR loci
The squared allele frequency correlations (r2) were obtained by analysis of 851 intra-chromosomal loci pairs using the 163 selected SSR markers. The r2 values ranged from 0.0071 to 0.7746 for all intra-chromosomal loci pairs, with an average of 0.0940. The r2 between markers on the same chromosome was mostly less than 0.10. Of the 851 assessed loci pairs, only 270 had r2 of more than 0.10 (31.7 %). The distribution of data points in the plot of LD (r2) decay against distance (cM) within the 12 chromosomes showed that LD was not a simple monotonic function of the distance between markers. However, r2 decreased as genetic distance between loci pairs increased, indicating that the probability of LD is low between distant locus pairs. The highest scores for the frequency of loci pairs in LD and the highest mean r2 were reported for loci pairs that mapped within <20 cM of each other, suggesting it should be possible to achieve resolution down to the 20 cM level, with r2 > 0.10 at p < 0.01.
The studies on allelic diversity have been proved to be fruitful in understanding the genetic basis of complex traits. The allelic richness of 10.4 observed in our study was the same with the previous reported by Garris et al. (2003) (mean 11.8) using 169 SSRs and 234 rice accessions, indicating higher levels of allele diversit y. After comparing allelic richness with the respective index of genetic diversity, we found that allelic richness was significantly associated with genetic diversity index, the correlation coefficients (r) between allelic richness and GD index and PIC were 0.757 and 0.789, respectively.
Grain quality in rice has received increasing attention in recent years. The application of mapping for eating quality may greatly facilitate improvement of grain quality in rice. Diverse populations are required for accurate association mapping based on LD. The Structure program implements a model-based clustering method for inferring population structure using genotype data consisting of unlinked markers (Pritchard et al. 2000). The model does not assume a particular mutation process and, in most cases, the estimated ‘log probability of data’ does not provide an accurate estimation of the number of clusters, K (Evanno et al. 2005). In this study, the distribution of L(K) did not show a clear mode for the true K, but ΔK did show a clear peak at the true value of K (Evanno et al. 2005). Using the structure with K = 4, the rice accessions were significantly differentiated into four (S1–S4) subgroups. The relatively small value of alpha (α = 0.035) indicates that most accessions in each subgroup originated from one primary ancestor, with a few admixed individuals. The analysis revealed that some accessions with partial ancestry probably had a complex breeding history involving intercrossing and introgression among germplasms from diverse backgrounds, overlaid with strong selection pressure for agronomic and quality characteristics (Mather et al. 2004). Model-based analysis of population structure, such as those conducted here, may be helpful in providing information that could be incorporated into association mapping analysis.
The ability to detect significant associations between molecular polymorphism(s) and particular phenotypes, as well as the resolving power of LD mapping techniques, depends on knowledge of the LD extent in species genomes and the rate of decay of LD with physical distance (Pritchard et al. 2000). D′ and r2 are most commonly used measures of LD (Gaut and Long 2003; Gupta et al. 2005), but the r2 has more reliable sampling properties than D′ in cases with low-allele frequencies, especially for self-pollinated species such as rice (Abdurakhmonov and Abdukarimov 2008). In large sets of rice accessions with diverse origins such as those here examined, linkage disequilibrium as r2, which is also an indication of marker–trait correlations, is the most appropriate LD quantification measure for association mapping (Gupta et al. 2005). In our study, the intra-chromosomal LD decay was up to 10 cM for all of the 130 accessions. Extensive LDs have also been reported in other selfing species. Malysheva-Otto et al. (2006), for instance, reported that intra-chromosomal LD extended up to 50 cM (r2 > 0.05) in 953 barley accessions. Although this level of LD persistence is considered to be high, long-distance LD with up to 50–100 cM (r2 > 0.2) has also been reported in some local populations of Arabidopsis accessions (Nordborg et al. 2002). Additionally, long-distance LD of up to 100 cM (r2 > 0.1) was detected in a population of European two-row spring barley (Kraakman et al. 2004). Many studies have examined the LD in rice (Garris et al. 2003; Rakshit et al. 2007; Mather et al. 2007; Agrama et al. 2007; Agrama and Eizenga 2008); although the amount of LD will vary across the genome due to such factors as recombination rates, mutation, population structure, relatedness, outcrossing, and selective pressure (Gupta et al. 2005; Abdurakhmonov and Abdukarimov 2008). There is evidence that LD is remarkably different in different rice species; therefore, our results could have important implications for association testing in rice.
Structured population and relatedness characteristics in rice accessions suggest that population structure and kinship in conducting population-based association mapping in rice germplasms should be considered; in particular, with our material, where predefined groups represented an unbalanced number of accessions (Pritchard et al. 2000; Yu et al. 2006). Association mapping without consideration of population structure would give a high rate of false positive type-I errors (Mather et al. 2004). A unified mixed-model approach to account for multiple levels of relatedness simultaneously, as detected by genetic markers, has resulted in improved control of both type-I and type-II error rates (Yu et al. 2006). Hence, we applied the mixed linear model (MLM) approach of Yu et al. (2006), considering both population structure (Q) and kinship (K) to eliminate possible spurious associations. This approach successfully identified a number of SSR markers that are significantly associated with physiological traits controlling rice eating quality.
In this study, twelve parameters representing eating quality of rice were used to analyze the marker-trait associations using 170 SSR markers. Some markers associated simultaneously with two or more traits, which might be the genetic reason for correlation among traits as well as pleiotropic effects of the gene(s) (Koyama et al. 2001). There have been several other reports on QTL analysis for rice eating quality, revealing that some rice physicochemical properties such as AC, GT and GC are controlled by one to three major genes. The enzymes involved in starch biosynthesis, such as starch branching enzyme (SBE), starch synthase (SS), and granule bound starch synthase (GBSS) contribute greatly to the variation of starch physicochemical properties and thus eating quality and used for ideotype breeding for eating quality by marker-assisted selection (Bao et al. 2006; Jin et al. 2010; Jantaboona et al. 2011). In the present study, the eating quality physicochemical properties were not significantly associated with these genes, only SS associated with BD. Bao et al. (2008) also found that the starch properties were not associated with the starch branching enzyme 1 (SBE 1) gene alleles. On chromosome 6, we found three markers (RM276, RM5815 and SS) associated with AC, BD and BD, respectively. Major genes, such as Wx (waxy gene) and alk (starch synthase II) (Lanceras et al. 2000) associated with eating quality (Bao et al. 2006, 2008) and AC, GC, and GT were controlled by the linked genomic region near the Wx locus (Tan et al. 1999), RM5815 and RM276 located on the proximal regions of Wx and alk, respectively, which may primarily be a linkage effect (Larkin et al. 2003). Yuan et al. (2010) identified 28 QTLs for grain quality for the 14 traits using inter mapping, on chr. 6 a few QTLS shared the similar genomic region, such as RM144 on chr. 11 associated with fb2, while they found it was associated with spr11.2. Furthermore, it was also found that although it is difficult to directly compare the chromosomal location of marker-trait associations detected in this study with the previous reported QTLs because different materials and mapping molecular markers were used, most marker-trait associations were in regions where QTLs associated with the given trait had previously been identified and some located in similar or proximal regions related with starch synthesis (http://www.gramene.org/). However, the new markers related with eating quality will facilitate the understanding of QTLs and marker-assisted selection (MAS).
In conclusion, whole-genome association studies have the advantage of enabling the entire genome rather than specific genes to be assessed for trait-associated variants. Application of association mapping to plant breeding is a promising means of overcoming the limitations of conventional linkage mapping. The results of our study demonstrate the significant potential of LD-based association mapping of physicochemical traits related to eating quality in rice accessions with SSR markers. This type of mapping could be useful alternative to linkage mapping for detection of marker–phenotype associations toward implementation of marker-assisted selection and the findings from the present study should provide important information for a functional analysis of rice eating quality and also be useful for MAS in rice breeding program aimed at developing new varieties with a high level of eating quality.
This study was supported by a grant from the BioGreen 21 Program (No. PJ009099), Rural Development Administration, Republic of Korea.