Fine mapping of the awn gene on chromosome 4 in rice by association and linkage analyses

Awnness is a key trait in rice domestication, yet no studies have been conducted on fine mapping or association mapping of the rice awn gene. In this study, we investigated the awnness and genotype of a core collection of 303 cultivated rice varieties and a BC 5 F 2 segregating population of 200 individuals. Combining association and linkage analyses, we mapped the awnness related genes to chromosome 4. Primary association analysis using 24 SSR markers revealed five loci significantly associated with awnness on chromosome 4. The associated markers cover previously identified regions. Fine association mapping was conducted using another 29 markers within a 4-Mb region, covering the associated marker in34, which is close to the awn gene Awn4.1 . Seven associated markers were revealed, distributed over an 870-kb region. Combining the fine association mapping and linkage analysis of awnness in the 200 BC 5 F 2 segregating population, we finally identified a 330-kb region as the candidate region for Awn4.1 . The results indicate that combining association mapping and linkage mapping provides an efficient and precise approach to both genome-wide mapping and fine mapping of rice genes.

Awn is an important trait in rice evolution and production. For example, wild rice has long awns, which are beneficial to seed dispersal and protect rice grains from animal attack. By contrast, most cultivated rice varieties do not have long awns for convenience of harvesting. Efforts to uncover the genetic mechanisms underlying the development of rice awnness began in the 1960s-1970s [1][2][3]. The results suggest that rice awnness is a complicated trait regulated by multiple genes, and that their expressions are affected by the environment. The first quantitative trait loci (QTLs) for the awn gene were reported in 1963 [2]. Three genes, An-1, An-2 and An-3 mapped to Chr 3, Chr 4 and Chr 5, respectively, were reported to control various degrees of awning via additive effects [2,4]. It was not until 1999 that the first molecularly mapped awn gene, an-5(t), was mapped to a 33.70-75.70 cM region of Chr 4 by Xiong et al. [5], using an F 2 population derived from a cross between wild rice P16 and indica Var. Aijiaonante. Thomson et al. [6] mapped an awn gene, Awn4.1, in the 0-14.6 cM region of Chr 4. Further major genes/QTLs have been mapped using cross populations. A total of 31 loci were found to be associated with awn presence or awn length in rice (http://www.gra-mene. org/), and seven of these loci were mapped to chromosome 4 (Table 1). However, none of these QTLs were fine mapped nor identified via association analysis.
Linkage analysis (LA) can only detect recombination events of a few relevant loci with two alleles in a segregating population produced by bi-parental crossing. By contrast, association mapping can detect recombination events accumulated in natural populations during long-term evolution and domestication, and has the advantage of higher mapping resolution, and even using a smaller sample population can be equally as effective [10][11][12][13]. Moreover, association mapping can detect the majority of loci controlling the same complex traits and multiple alleles of the same locus. Association mapping was first used to detect alleles in human disease and has now been widely applied to gene mapping in plants [14][15][16][17][18][19][20][21].
In this study, the population used for association mapping consisted of 303 cultivated rice varieties in a rice mini core collection (including 204 Chinese varieties [22] and 99 varieties from other countries, Table S1). The awnness of each variety was evaluated in Sanya (18°09′N) and Hangzhou (29°44′N) in 2006 and in Beijing (39°56′N) in 2009. If over 30% of the florets for any variety had awn in any of the experimental locations, the variety was considered as an awned variety. Eighty awned varieties were observed (Table  S1) and 40% of these showed awn in all 3 locations.
Population structure can produce stronger linkage disequilibrium (LD) between non-linked loci, because of geographical origins, local adaptation and breeding history; therefore, association mapping has a higher probability of type I errors than linkage analysis. Thus, we incorporated the individual's membership (the Q value of the subpopulations) as the covariate in the association analysis. The model-based (Bayesian) cluster software STRUCTURE 2.2 [23] was used to estimate the population structure of the 303 rice varieties with 60 unlinked polymorphic markers distributed across all the rice chromosomes. Using a burn-in of 10000 and a run length of 100000, with an admixture model and correlated allele frequencies, each K (the inferred number of subpopulations), from K=1 to K=10, was run independently ten times. We selected the value of K by investigating the value of LnP(D) and the value of ΔK [24] Our results [25] indicated that there were two distinctly divergent subpopulations, for which the membership coefficient of each variety, i.e. the Q value, was estimated.
Awnness is a binomial trait; therefore, in the association analysis of awnness, we employed a logistic model: Ln(P/1-P) = β 0 +β 1 Q+β 2 M, where P is the probability of awned varieties, Q is the individual's membership in the first subpopulation, M is the marker's indicative function, and β 0 , β 1 , and β 2 are unknown parameters, estimated by the maximum likelihood method provided by Proc Logistic in SAS. Most of our SSR markers are multiple alleles; therefore, we transformed the multi-allelic data into biallelic data using Plink (http://pngu.mgh.harvard.edu/~purcell/plink/ index.shtml). Meanwhile, we excluded the rare alleles (frequency less than 5%). To control the false positives, we calculated the false discovery rate (FDR) [26] and the adjusted P-value (P adj ) using Proc Multtest in SAS and selected the loci with P adj <0.05 as significant associated loci. In addition, we investigated the collinearity of the population structure and the markers using Proc REG in SAS. The results show that the tolerances between population structure and the markers were all above 0.42, much higher than the maximum tolerance (0.2) for multicollinearity to exist [27]. This indicates that the association excludes the influence of multicollinearity.
To reduce the workload, we implemented the association analysis step by step. First we conducted a preliminary association using 24 markers distributed evenly on Chr 4 ( Figure 1), and then we increased the marker density around the significantly associated loci obtained in the preliminary scan and carried out a further fine association mapping. In the preliminary association, six loci were significantly associated with rice awnness (P<0.01) and five of these exhibited a false discovery rate of less than 0.05 after multiple testing ( Figure 1 and Table 2). Our association analysis detected the majority of the QTLs previously reported on Chr 4 ( Figure 1). For example, marker in34 was located in the region of Awn4.1 and close to qAl4-1; Rm5320 was mapped in the region of An5 and close to An1 and qAL4-2. In addition, we found three new loci associated with awnness on Chr 4. This suggests that we can achieve higher efficiency in gene mapping using association mapping than linkage analysis, which requires a segregating population.   Of the associated loci depicted above, in34 falls in the region of a QTL that was identified in the segregating population BC 5 F 2 , whose donor and recurrent parents were, respectively, awned Gaoli upland rice and awnless Nipponbare. We conducted fine association mapping in this region by designing 29 additional markers within the 4-Mb region around in34 using the same approach as in the preliminary association mapping. In addition to in34, 6 markers were detected to be significantly associated with awnness (P<0.01, P adj <0.05) (Table 2, Figure 2). These awnnessassociated markers (in60, indel114, cmm1157, in34, in33, in26, and in24; Table 2 and Figure 2) covered an 870-kb fragment, which made it impossible to confirm further which of these seven markers were closest to the awn gene or to narrow the mapping region via association analysis alone. Consequently, we turned to linkage analysis on a specific segregation population.
In the BC 5 F 2 population, the awnness was segregated in the ratio 3 (148 awned plants): 1 (52 awnless plants) (χ 2 =0.054). This suggests that the awnness in this BC 5 F 2 population is controlled by a dominant gene (Figure 3). Using bulked segregant analysis (BSA) [28], 5 (Rm1236, cmm1061, cmm1304, in60, and in34) of the 53 markers on Chr 4 displayed polymorphism between the awned group and the awnless group. Utilizing MapMaker/EXP3.0 [29], we mapped the awn gene to a 4.5-cM region between in60 and in34, with map distances of 1.2 cM from the former marker and 3.3 cM from the latter (Figure 4). In this region, Figure 2 Fine association mapping in the 4-Mb region centered around in34 on Chr 4. Physical distances (in 100-kb) between adjacent loci are shown at the bottom. The number following the "-" in the marker name denotes its genotype. -log(P raw ), -log(P fdr ), and -log(P adj ) are, respectively, the negative logarithms to base 10 of P raw , P fdr and P adj . The black rectangle shows the final candidate region in which Awn4.1 is located.

Figure 3
The panicle phenotype of Nipponbare (middle), near isogenic awned (left) and awnless (right) lines. 4 markers, including in60, indel114, cmm1157, and in34 are significantly associated with the awnness of rice grains ( Figure 2). Linkage and association mapping (illustrated in Figures 2 and 4, respectively) show that indel114 and cmm1157 are closer to the mapped gene than in60 and in34. Thus the candidate gene region could be narrowed down to a 330-kb region between indel114 and cmm1157.
How do we explain the occurrence of significantly associated loci outside the 330-kb candidate region? Is this a false positive association caused by linkage disequilibrium between markers rather than association with the awn gene? Or does it indicate the existence of other awn-related genes? To address these issues, we analyzed the linkage disequilibrium of seven significantly associated loci (in60, indel114, cmm1157, in34, in33, in26 and in24) using Tassel (http://www.maizegenetics.net/tassel/). Significant pairwise LDs among markers (R 2 >0.1, P<0.01) were observed in both indica and japonica subpopulations (Table 3). This suggests that the association significance of the loci (in60, in34, in33, in24 and in26) outside the candidate region is caused by linkage disequilibrium of neighboring loci and does not imply the existence of further awn genes.
In summary, the logistic regression model, based on population structure, has been shown to be reliable and highly efficient in association analysis of genes regulating complex traits. However, a combination of association and linkage analyses can increase mapping efficiency and accuracy. Association mapping using a natural population can be applied to fine mapping of the results obtained by linkage analysis when a QTL region is large, e.g. 5 cM, and no segregating population or polymorphic markers are available. Additionally, fine association mapping combined with linkage mapping can exclude false positive associated loci caused by high linkage disequilibrium. Our preliminary association analysis detected five loci (in34, RM5320, RM1153, RM1112 and Rm1272) associated with awnness on Chr 4. Further fine mapping combining association and linkage analysis located Awn4.1 within a 330-kb fragment. In addition, multicollinearity analysis of the population  structure and markers suggested that the parameter estimation of the logistic regression model utilizing population structure as co-variant is reliable. To reduce the risk of false positives, we also calculated the false discovery rate.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and source are credited. Table S1 PDB IDs for antigen-antibody complexes included in the testing dataset

Supporting Information
The supporting information is available online at csb.scichina.com and www.springerlink.com. The supporting materials are published as submitted, without typesetting or editing. The responsibility for scientific accuracy and content remains entirely with the authors.