Development of a Research Platform for Dissecting Phenotype–Genotype Associations in Rice (Oryza spp.)
We present an overview of a research platform that provides essential germplasm, genotypic and phenotypic data and analytical tools for dissecting phenotype–genotype associations in rice. These resources include a diversity panel of 400 Oryza sativa and 100 Oryza rufipogon accessions that have been purified by single seed descent, a custom-designed Affymetrix array consisting of 44,100 SNPs, an Illumina GoldenGate assay consisting of 1,536 SNPs, and a suite of low-resolution 384-SNP assays for the Illumina BeadXpress Reader that are designed for applications in breeding, genetics and germplasm management. Our long-term goal is to empower basic research discoveries in rice by linking sequence diversity with physiological, morphological, and agronomic variation. This research platform will also help increase breeding efficiency by providing a database of diversity information that will enable researchers to identify useful DNA polymorphisms in genes and germplasm of interest and convert that information into cost-effective tools for applied plant improvement.
KeywordsGenome-wide association mapping Rice phenotyping SNP genotyping Linkage disequilibrium Chromosome segment substitution lines Oryza rufipogon Oryza sativa
Only a small fraction of the naturally occurring genetic diversity available in rice germplasm repositories around the world has been explored to date. This is beginning to change with the advent of affordable, high-throughput genotyping approaches coupled with robust statistical analysis methods that make it possible to examine genome-wide patterns of natural variation and link sequence polymorphism with complex trait variation. Association mapping offers one way of identifying genes and quantitative trait loci (QTL) underlying quantitatively inherited variation in both plants and animals based on the analysis of diverse collections of wild and domesticated strains. This approach is opening the door to new forms of collaboration aimed at discovering the function of the genes and identifying under-utilized alleles and allele combinations that can be used to drive improvements in crop performance (Ersoz et al. 2009; Flint-Garcia et al. 2005; Richards et al. 2009; Zhu et al. 2008). Similar to traditional QTL mapping, genome-wide association mapping relies on the strength of linkage disequilibrium (LD) across a diverse population, and aims to identify relationships between markers and functional polymorphisms that determine traits of agronomic, aesthetic, and evolutionary interest. However, it does so in the context of evolutionary biology and population genetics (Clark et al. 2007; Zhao et al. 2007), providing an open-ended opportunity to mine existing breeding populations and germplasm collections for valuable alleles that have not been previously captured in plant breeding programs (Tanksley and McCouch 1997).
In this paper, we present an overview of the development of a research platform that provides essential germplasm, genotypic and phenotypic data, and analytical tools for dissecting phenotype–genotype associations in rice. These resources include a diversity panel consisting of 400 Oryza sativa and 100 Oryza rufipogon accessions that have been purified by single seed descent, a custom-designed Affymetrix array consisting of 44,100 SNPs, an Illumina GoldenGate assay consisting of 1,536 SNPs (Zhao et al. 2010), and a suite of low-resolution 384-SNP assays designed for the Illumina BeadXpress Reader that are designed for applications in breeding, genetics and germplasm management. We are developing a database to provide access to the SNP diversity data on our panel of germplasm (44,100 SNP genotypes × 500 rice accessions); the data will be available for download from our project web site (www.ricediversity.org) and through the Gramene database (www.gramene.org). The diversity panel has also been phenotyped for a core set of morphological and developmental traits under both controlled and field conditions (Ali et al. 2010a; Kovach et al. 2009; Sweeney et al. 2007; Takano-Kai et al. 2009). In parallel, we have developed novel computational methods for SNP selection, allele calling, and quality control that permit the incorporation of inbreeding information on a per-sample basis, allowing for accurate genotyping of both inbred and heterozygous samples (Wright et al. 2010). We have used our SNP diversity dataset to develop and optimize several 384-SNP genotyping assays for use in applied breeding programs. We are simultaneously developing six interspecific libraries of chromosome segment substitution lines (CSSLs) to complement the arsenal of mapping populations and other genetic resources available for validating QTLs and unraveling the gene networks underlying complex trait variation in wild and cultivated rice.
Our long-term goal is to empower basic research discoveries in rice by linking sequence diversity with physiological, developmental, morphological and agronomic variation. This research platform will help increase breeding efficiency by providing a database of diversity information that will enable researchers to readily identify useful DNA polymorphisms in genes or germplasm samples of interest and convert that information into cost-effective tools for basic biological inquiry and applied plant improvement.
Use of low, medium, and high-resolution SNP assays and re-sequencing strategies
Different types of research require different tool kits to be successful. The rice research community supports a wide spectrum of both basic and applied activities aimed at understanding and utilizing natural variation. These include evolutionary biology and population genetics inquiries, QTL mapping and gene discovery, allele mining and germplasm management, and pre-breeding and variety development, among others. In addition to selecting a platform best suited for one’s research interests, decisions about the genotyping platform may revolve around cost efficiency, turn-around time, throughput, information content and ease of use and analysis.
By developing a set of SNP genotyping platforms that vary in resolution and species/population/specificity, we aim to facilitate different types of research that focuses on the exploration and utilization of natural variation in rice. In addition to SNP-detection platforms, second- and third-generation re-sequencing approaches are rapidly coming down in cost, but are currently ten times more expensive than fixed arrays (such as the Affymetrix SNP-detection arrays based on 44,000-SNPs or 950,000 SNPs), and require significantly higher levels of bioinformatics expertise for data generation and analysis. High-resolution SNP-detection platforms are useful for generating large databases of information about SNP diversity on hundreds or thousands of lines from a germplasm collection or breeding program. Fixed arrays with >40,000 SNPs are sufficient for genome-wide association mapping in rice (see section on LD) and provide insight into population substructure, while chips with 950,000 SNPs support gene discovery and functional genomics research and offer a powerful tool for allele mining in germplasm collections. Data generated using high-resolution SNP chips also provides essential information that enables rapid development and deployment of low-resolution assays that are technically easier and more economical to run.
To be attractive to the breeding and genetics communities, low-resolution SNP-detection assays must be inexpensive, high-throughput, accurate and require little technical investment. Low-resolution assays are generally custom-designed for a particular population, and may be used to rapidly assay hundreds or thousands of individuals within a short time window. Genotyping platforms such as the Illumina BeadXpress Reader provide rapid, low-cost 96 or 384-plex SNP genotyping for a variety of applications, including primary QTL analysis, NIL development and backcross conversion, varietal identification, quality control in the market or as part of germplasm management, to ensure outcrossing or inbreeding in a hybrid rice program, or to fine map a target region in a gene-discovery program. Likewise, the 1,536-SNP GoldenGate assay is ideally suited for genomic selection in a breeding program. At this level of resolution, the assay is likely to detect polymorphism across a range of individuals with varying levels of relatedness. It can be used on a large number of individuals over many years if it is well designed and able to capture the essential components of variation in the breeders’ gene pool.
Development of Illumina and Affymetrix SNP genotyping assays
- 1.1536-SNP Illumina GoldenGate Assay: SNPs were selected from a SNP discovery pool generated by re-sequencing 20 diverse landraces of rice using Perlegen re-sequencing technology (www.oryzasnp.org; McNally et al. 2009). The 20 O. sativa accessions that were re-sequenced to form the discovery dataset included seven indica, three aus, five tropical japonica, four temperate japonica, and a single accession each of Group V (aromatic), Aswina and Rayada (McNally et al. 2006); see discussion of O. sativa population structure below). The choice of SNPs from the discovery pool was done in a step-wise manner (Zhao et al. 2010). During the first stage, we obtained a designability score estimate for virtually all available SNPs using the Illumina design tool. For those that were in perfect LD with other SNPs within 500 kb, we prioritized those with high frequency scores in the tropical japonica, temperate japonica, and the indica and/or aus subpopulations. We were able to calculate site frequency spectra and target SNP variation within only these four major subpopulations because re-sequencing information was available for only one accession each of Aswina, Rayada and Group V. Our final selection identified a set of 1,536 well-distributed SNPs (Zhao et al. 2010). A map showing their distribution on the 12 chromosomes of rice is presented in Fig. 1.
Development of 44,100-SNP Affymetrix custom array: the 44,100-SNP array was designed to provide genome-wide coverage of the rice genome at ~1 SNP/10 kb, a density that was expected to support association mapping (see section on “Estimates of linkage disequilibrium below for rationale). The 44 K array also allowed us to expand our database of polymorphisms for the 500 accessions in our diversity panel by almost 30-fold.
We again selected SNPs from the OryzaSNP dataset described above (McNally et al. 2009), and supplemented with SNPs discovered from BAC-end sequences of two O. rufipogon accessions generated by the OMAP project (www.omap.org; Ammiraju et al. 2006). The BAC-end sequences had been aligned to the Nipponbare genome (www.gramene.org). Using similar criteria as for the 1536-SNP chip, we identified tag-SNPs using a conservative tagging window size of 50 kb and eliminated SNPs that had more than one hit in the genome and/or more than one mismatches. One thousand two hundred forty-three reliable SNPs from the Illumina SNP chip were included in the Affymetrix array for cross-platform validation. SNPs from the OMAP data were used to fill in gaps >20 Kb between the tag-SNPs. Finally, we added an additional 4,000 SNPs from four highly polymorphic regions (two on chr 1, one on chr 3, and one on chr 11) to provide finer scale evaluation of LD decay in specific regions based on genotyping in our diversity panel (Fig. 1).
Based on genotyping of the ~400 O. sativa samples in our diversity panel, >90% of the 44,100 SNPs on the array passed our QC criteria. These were based on the expected error rates computed from posterior call probabilities (Wright et al. 2010). Of these SNPs, the inbred samples run to date have a median call rate of 95.9%. Control samples, as in the Illumina array, show a high degree of agreement with external validation data of the published Nipponbare and 93-11 genome sequences. Additionally, based on control samples that were run on each plate, pairwise concordances between technical replicates yielded >99% average pairwise concordance and >92% average call rate.
Development of 384-SNP assays for applications in genetics and breeding: the availability of re-sequencing data on an increasing number of accessions and SNP diversity data on 400 O. sativa samples provides a database of critical information about the frequency of individual SNPs in different subpopulations or varieties. We have successfully used this information to develop a suite of “breeders chips” in collaboration with colleagues at IRRI and in the USDA-ARS. Each breeder’s chip consists of 96 or 384-SNPs that can be used to economically and rapidly assay large numbers of lines from segregating populations using Illumina’s BeadXpress platform.
The SNP chips we have designed to date provide optimal genome coverage across the 12 chromosomes of rice or targeted coverage across specific regions of the genome. SNPs are selected to target polymorphisms in specific pairs of parents, within or between different subpopulations. For example, 384-SNP chips have been optimized for indica × japonica populations commonly used for QTL mapping, others for indica × aus or tropical × temperate japonica breeding populations and some for indica × O. rufipogon or japonica × O. rufipogon populations.
Our collaboration with USDA-ARS colleagues required that we identify a set of genome-wide SNPs that could differentiate elite US tropical japonica lines as the basis for mapping QTLs from within this narrow gene pool. As part of the RiceCAP project (http://www.ricecap.uark.edu/), nine US tropical japonica genomes were re-sequenced to generate a large SNP discovery pool, providing the base for the selection of a set of well-distributed SNPs that were used to build a 384-SNP chip for US breeders. Colleagues in Japan have undertaken a similar strategy for developing SNP assays that allow rice breeders in Japan to trace the inheritance of genome segments across elite Japanese temperate japonica varieties (Nagasaki et al. 2010; Yamamoto et al. 2010). The rapidly expanding reservoir of SNP-based diversity information provides a critical resource for developing a wide array of useful genotyping tools for applications in genetics, germplasm management, and plant improvement.
The genotypic and phenotypic diversity datasets from the project described in this paper will be publicly available as downloadable PLINK files (Purcell et al. 2007). We are currently designing an application that will automate the SNP selection process so users can build customized 96 or 384-SNPs arrays based on their lines of interest and preferred SNP density across the genome. The SNP selection tool will be available from our project website (www.ricediversity.org) and a SNP query tool is now available in the Gramene Database (www.gramene.org). All the data from this project will also be integrated into the Genetic Diversity Module of the Gramene database where it can be viewed in the context of other diversity and genomic information available in plants (http://www.gramene.org/db/diversity/diversity_view).
Development of the rice diversity panel
A collection of ~400 diverse O. sativa accessions from 79 different countries, including both landraces and elite varieties, along with ~100 accessions of O. rufipogon from 14 different countries in Asia forms the basis for the germplasm diversity panel in this study. These accessions were selected to represent the range of geographic and genetic diversity of the species (Garris et al. 2005; Yan et al. 2007; K. McNally, pers. comm.; J. Jung, pers. comm.) and include 18 varieties used for SNP discovery in the OryzaSNP dataset (McNally et al. 2009), 150 from the study by (Garris et al. 2005), 159 from the USDA-ARS NSGC rice core collection (Yan et al. 2007), and 16 lines previously used as mapping parents in published QTL studies (www.gramene.org). Information about the accessions is available in Ali et al.( 2010a) and at the project website, www.ricediversity.org.
Purification of seed stocks
Population structure in O. sativa
Table summarizing the number of accessions belonging to each varietal group and subpopulation based on evaluation with 36 SSRs, 1536 SNPs or 44,100 SNP markers and using different coefficients of ancestry to classify accessions into different groups: >60%, >80% and >90% ancestry was determined using STRUCTURE and PCA
SSR classification >60%
44 K SNP-group by struct 60%
44 K SNP-group by struct 80%
44 K SNP-group by struct 90%
44 K SNP-group by PCA
aromatic (Group V)
In O. sativa, the deep population structure poses significant challenges for association mapping because functional variation that is highly correlated with subpopulation structure cannot be distinguished from all other (non-causative) subpopulation-correlated genetic variation. Thus, population structure limits the power of association mapping where a trait is highly correlated with the degree of genetic relatedness. Subpopulation structure also sub-divides the primary gene pool of rice into highly differentiated germplasm groups and in some groups there is a paucity of historical recombination that limits mapping resolution (i.e., in the temperate japonica and aromatic subpopulations; Myles et al. 2009; Rakshit et al. 2007). As a result, much larger sample sizes are needed to provide the power required for associating genotype with many traits of interest to the rice community, particularly where the genetic architecture of the trait differs across subpopulations (Sneller et al. 2009).
Population structure in O. rufipogon
The population structure of O. rufipogon is less well-defined than that of O. sativa. O. rufipogon is a species complex, often referred to as Oryza perennis (Vaughan et al. 2008). It consists of both perennial and annual forms which are referred to as O. rufipogon and Oryza nivara, respectively (Morishima et al. 1984). However, a lack of reproductive isolation, coupled with evidence of continuous variation, substantiates the view that these annual and perennial forms are more accurately viewed as distinct ecotypes of O. rufipogon (Barbier et al. 1991; Lu et al. 2002; Oka 1988; Zhu and Ge 2005; Zhu et al. 2007). For most of the twentieth century, O. rufipogon held little interest for rice breeders or producers, except as a weed to be eliminated from their fields, while rice geneticists recognized it as a source of cytoplasmic male sterility, disease and insect resistance (Brar and Khush 1997; Song et al. 1995). More recently, it has also been shown to be an important reservoir of useful genes for enhancing yield (Marri et al. 2005; Cho et al. 2003; Xie et al. 2008; McCouch et al. 2007). Molecular marker-based analysis has also been used to characterize the population structure of O. rufipogon and relate it to geographic and ecological differentiation (Liu et al. 2007; Londo et al. 2006; Vaughan et al. 2008; Wang et al. 2008). However, due to the use of different accessions by different groups, the global subpopulation structure of this wild species remains poorly defined. An international effort to coordinately analyze a geographically and ecologically diverse collection of wild germplasm will be required to fully document the structure of existing O. rufipogon populations.
Estimating the number of SNPs needed for genome-wide association mapping in rice
The number of markers needed to perform genome-wide association mapping is determined by the extent of LD, or allelic association, in the species or population(s) under investigation. LD is defined as the nonrandom association of alleles at different loci in a population (Flint-Garcia et al. 2003). It is measured as the strength of correlation between polymorphisms (i.e., SNPs) caused by their shared history of recombination. Levels of LD are increased when polymorphisms are correlated as a result of linkage, selection, and/or admixture, while recombination and independent assortment decrease levels of LD. Because effective recombination is lower in naturally self-pollinating species, compared to outcrossing species, inbreeding species such as rice tend to have extensive genome-wide LD.
The genomic distance over which LD persists will determine the number and density of markers needed to perform a genome-wide association analysis. If LD decays within a short distance, mapping resolution is expected to be high, but a large number of markers are required. If LD extends over a long distance, then mapping resolution will be low, but a relatively small number of markers are required for genome-wide association studies.
Estimates of linkage disequilibrium
Estimates of LD in O. sativa are ~100 kb in indica and aus, (Garris et al. 2003) and longer in tropical japonica, temperate japonica and Group V (aromatic) (Mather et al. 2007; Garris et al. 2003). LD decays more quickly in O. rufipogon (~30–50 kb; Rakshit et al. 2007) where levels of outcrossing are 20–30%, compared to levels of only ~2–3% in O. sativa. Based on the 44 K SNP data generated on this project, we observe similar rates of LD decay in these different species and subpopulations (K. Zhao, pers. comm.).
Due to the fact that many SNPs occur at high frequency in some, but not all subpopulations, and that some SNPs will be specific to a particular subpopulation, the predicted number of SNPs required to “tag” the majority of haplotype blocks in the rice genome is expected to be on the order of ~20,000–30,000 SNPs. To achieve this density, and assuming a genome size of ~400 Mb, approximately 4,000 informative SNPs would be needed within each of the indica and aus subpopulations (~1 SNP per 100 kb), ~800 SNPs would be needed within the tropical japonica and temperate japonica subpopulations (~1 SNP per 500 kb), and a minimum of ~8–12,000 well-distributed SNPs would be needed within O. rufipogon. Using the combined OryzaSNP and OMAP SNP discovery pools, we were able to identify a set of well-distributed SNPs showing within-population variation (frequency >10%) for each of the four major O. sativa subpopulations, indica, aus, tropical japonica, and temperate japonica as well as for O. rufipogon.
Our decision to build an Affymetrix array consisting of 44,100 SNPs was based on the conservative assumption that not all of the SNP targets selected from the SNP discovery pools were likely to convert into reliable SNP-detection assays. We were very pleased that ~82% of SNPs were successfully converted using both the Illumina GoldenGate and the Affymetrix custom arrays. Furthermore, there was excellent inter-convertability between the two platforms.
Novel algorithms for allele calling and quality control
Current methods for automated allele calling of genotypes are almost entirely based on clustering approaches that perform poorly when a majority of the samples are inbred or deficient in heterozygotes, as is the case for all inbreeding species, including rice and several other important crop species. This is largely due to the fact that the default algorithms used by Illumina or Affymetrix were first developed to classify mammalian genotypes based on the expectation that the population under consideration was in Hardy–Weinberg equilibrium. Thus, when the heterozygote cluster is under represented, or completely absent (as is the case for inbred lines), the software cannot reliably identify cluster locations and boundaries, unless a large dataset is generated to “train” the algorithm. Problems with the default allele calling software became immediately obvious when we started to analyze the rice data.
To address this issue, we have developed a novel genotype-calling approach called “ALCHEMY” which is based on a statistical model of the process generating the data and not on clustering methods (Wright et al. 2010). This model-based genotype-calling algorithm does not depend on ad hoc or generalized clustering methods and can accept a priori specified inbreeding coefficients. This allows the method to make adjustments depending on the expected frequency of heterozygosity in a sample. Simultaneous estimation and optimization of the inbreeding coefficient on a per-sample basis allows both outbred and inbred samples to be analyzed simultaneously and improves both accuracy and call rates. Our method also provides a posteriori quality scores on a per-SNP basis so that the reliability of specific SNPs can be evaluated. It is capable of making an inference even if only a single sample is analyzed, though the parameters of the model are refined and optimized when several samples are available for simultaneous inference. ALCHEMY has been shown to obtain >99% accuracy with as few as six samples, with larger numbers of samples continuing to improve call rates. Details regarding the theory and utility of ALCHEMY for calling SNPs using both the Illumina and the Affymetrix platforms can be found in Wright (2010). ALCHEMY is available as open source software (http://alchemy.sourceforge.net).
Twelve phenotypes related to plant morphology and domestication-related traits were evaluated on ~100 greenhouse-grown accessions in both Stuttgart, AR and Ithaca, NY using purified seed stocks of O. rufipogon/O. nivara from the rice diversity panel. These phenotypes are being analyzed at this time to identify trait-marker associations and will enable us to compare the resolution of trait dissection using O. sativa and O. rufipogon. We are also interested in evaluating haplotype diversity in O. rufipogon/O. nivara and comparing it to O. sativa to identify regions of the genome that show the footprints of selection associated with domestication and varietal differentiation. Regions associated with selective sweeps offer targets for future investigation aimed at identifying genes and alleles underlying the selection and breaking up extensive linkage blocks through recombination. Interspecific and inter-subspecific population development has already been initiated and has great potential to invigorate populations of O. sativa and release new variation for long-term selection by breeders.
Validation of QTLs identified by association mapping requires the development of multiple bi-parental mapping populations using parents from the association mapping panel. These populations may be complemented by access to mutant or functional genomics populations as well as more specialized resources such as libraries of introgression lines or CSSLs.
As part of this project, we are constructing six libraries of CSSLs using three diverse O. rufipogon/O. nivara donors from our diversity panel, and two O. sativa recurrent parents, one indica and one tropical japonica. The donors were selected from different branches of the phylogenetic tree based on SNP- and SSR clustering of O. rufipogon/O. nivara. One of the O. rufipogon/O. nivara donors clusters near indica, one clusters near japonica, and one is classified as independent, because it shows no clustering with any of the O. sativa subpopulations.
A set of CSSLs can be grown in different environments by different researchers, and evaluated for multiple traits to determine whether particular genes, QTLs or chromosomal segment(s) from the donor are responsible for trait variation in the recurrent parent. Because the number of lines in a primary set of CSSLs is usually small (~96), and the size of each introgression is quite large (~6–8 Mb in size, or 24–32 cM), the resolution of CSSL mapping is roughly equivalent to that of QTL mapping, but this is accomplished with less than half the number of lines. The phenotypes observed in a CSSL population can be immediately mapped to a particular introgressed segment, and is not complicated by differences in flowering time or other aspects of plant development that are generally observed when a large number of other donor introgressions are segregating in the genetic background, as occurs with recombinant inbred lines. When using CSSLs to validate QTLs detected by association or linkage mapping, the performance of a particular introgression line is compared to the recurrent parent and to sib introgession lines, and this can be readily accomplished in as many environments or under as many treatments as appropriate. Individual introgression lines provide an excellent starting point for positional cloning and are useful immediately as parents in a breeding program. This may be of particular importance if the donor was a wild accession because a particular CSSL contains only a single introgressed segment of interest, and this exotic introgression can be easily bred into other elite cultivars using marker-assisted selection.
Interspecific populations between O. sativa × O. rufipogon have been used to identify offspring that outperform the better parent (positive transgressive segregants; McCouch et al. 2007; Tanksley and McCouch 1997). The fact that this occurs at relatively high frequency in BC2 populations, and that it is possible to identify O. rufipogon-derived QTLs associated with the favorable effect in both inbred and hybrid backgrounds has created strong interest in the use of interspecific backcross breeding in rice. Libraries of CSSLs provide a powerful and an efficient way to verify the impact of a target introgression and required only a comparison of the performance of the appropriate CSSL and its recurrent parent. When CSSL libraries do not exist, researchers generally have to go through several generations of backcrossing to develop QTL-NILs.
We have enhanced the efficiency of generating CSSLs by developing several 384-SNP assays for genotyping and selection during the backcrossing process. The SNPs used to develop the mini-assays were selected from the 44 K SNP database and additional re-sequencing data based on their distribution across the genome and their ability to detect polymorphism in the specific parental accessions used to construct the CSSLs. A tool for selecting useful subsets of SNPs for applications in genetics and breeding is also under development and will be hosted on our project website (www.ricediversity.org). This tool will make it easy for researchers to directly utilize genomic information from this and other projects to develop customized SNP assays for multiple different types of applied and basic research in rice.
Recently, researchers have begun to identify networks of interacting loci that explain the genetic architecture of non-additive variation for rice traits such as starch biosynthesis and flowering time by leveraging prior knowledge of candidate genes from other species, as well as information derived from biochemical and regulatory pathways (Maas et al. 2010; Tian et al. 2009; Uwatoko et al. 2008). This work provides a gene-based paradigm for dissecting the genetics of transgressive variation, and establishes the groundwork for the development of synthetic models to guide the creation of new crop varieties with novel attributes. CSSLs and other genetic resources provide invaluable tools for translating gene-based discoveries into breeding realities because they represent a material link between genomic information and the reproductive organism.
With financial support from the collaborators around the world (www.ricesnp.org), we are currently re-sequencing 100–150 accessions of wild and cultivated Asian and African rice using Illumina GAII technology. Accessions are selected to represent the diversity of O. sativa, O. rufipogon/O. nivara, O.glaberrima and O. barthii and we are generating 5–55× genome coverage for each. A parallel effort is being undertaken for other wild Oryza genomes as part of the OMAP project by the Wing lab (http://www.omap.org/). These re-sequencing efforts are providing the plant breeding community with vital information about the distribution and frequencies of SNP alleles within and between populations of both wild and cultivated materials. This information will allow us to undertake evolutionary and population genetics analyses enabling the selection of the most informative SNPs for targeted applications in genetics, breeding and germplasm management.
A future endeavor involves the design and development of a high-quality 950,000-SNP array for rice. The 950 K SNP chip will feature: (a) comprehensive coverage of the rice genome, with ~1 SNP/kb across the entire genome, (b) at least one SNP in every annotated, single copy gene, (c) a balanced spectrum of polymorphisms within and between sub-populations of O. sativa, O. glaberrima and its wild relative, O. rufipogon/O. nivara and O. barthii, (d) ability to target copy number variation by including probes for invariant sites distributed throughout the rice genome, and (e) ability to target methylation sites in the rice genome as the basis for exploring the role of epigenetics in regulating trait expression.
As part of the larger Rice Diversity Research Platform, the 950 K SNP chip will be used to genotype several thousand rice accessions from the IRRI, GRIN, and NIAS germplasm banks and make the data available through public databases such as Gramene (www.gramene.org), NIAS Oryza SNP database in Japan (http://oryza-snp.dna.affrc.go.jp/en/index_en.html), and IRRI website (http://iris.irri.org). In parallel, scientists at IRRI are coordinating a broad-based phenotyping initiative and developing a rich set of mapping populations that will complement the genotyping effort described above.
The rich phenotypic and genotypic diversity data generated from this initiative will greatly expand our understanding of natural variation in crop plants and will open the door to more efficient utilization of the enormous wealth of diversity available in rice germplasm repositories around the world. This combination of genetic, genomic, and phenotypic information and resources will be immediately useful to the plant breeding community and will significantly increase the depth, breadth, and rigor of genetic analyses that can be undertaken in rice. The ability to link sequence and diversity information to physiological functions, plant development and agronomic traits in rice will encourage a new generation of highly qualified and competent young scientists to the field of rice research and will simultaneously expand the foundation for comparative genomics using rice as a pivotal reference genome.
We thank Teresa Hancock and Heather Maupin for their valuable assistance in phenotyping of the rice diversity panel, Daniel Wood, Fumio “Gen” Onishi and Kazi Akther for crossing and genotyping during CSSL development, and Dr. Rolfe Bryant (USDA-ARS Stuttgart, AR) for help with chemical analyses. This project is funded by the National Science Foundation Award 0606461 (to SMc; GE; AM; CB), the Crop Functional Genomics Center of the 21st Century Frontier Research Program (Project no. CG3113), Republic of Korea (to S-N. Ahn), and USAID Linkage Program and the Government of Japan (to IRRI).
- Ahn SN, Suh JP, Oh CS, Lee SJ, Suh HS. Development of introgression lines of weedy rice in the background of Tongil-type rice. Rice Genetics Newsletter. 2002;19:14.Google Scholar
- Ali ML, McClung AM, Jia MH, Kimball J, McCouch SR, Eizenga GC. A “rice diversity panel” evaluated for genetic and agro-morphological diversity between sub-populations. Mol Breed. 2010a, in press.Google Scholar
- Ali ML, Sanchez PL, Yu SB, Lorieux M, Eizenga GC. Chromosome segment substitution lines: A powerful tool for the introgression of valuable genes from wild species of rice (Oryza spp.). Rice. 2010b; this issue.Google Scholar
- Ammiraju JSS, Luo M, Goicoechea JL, Wang W, Kudrna D, Mueller C, et al. The Oryza bacterial artificial chromosome library resource: construction and analysis of 12 deep-coverage large-insert BAC libraries that represent the 10 genome types of the genus Oryza. Genome Res. 2006;16:140–7.PubMedPubMedCentralCrossRefGoogle Scholar
- Cho YC, Suh JP, Choi IS, Hong HC, Baek MK, Kang KH, Kim YG, Ahn SN, Choi HC, Hwang HG, Moon HP. QTLs analysis of yield and its related traits in wild rice relative Oryza rufipogon. Treat of Crop Res In Korea 4. 2003.Google Scholar
- Ersoz ES, Yu J, Buckler ES. Applications of linkage disequilibrium and association mapping in maize. In Molecular genetic approaches to maize improvement, Ed. 2009; 173–195.Google Scholar
- Nagasaki H, Ebana K, Shibaya T, Yonemaru J and Yano M. Core single-nucleotide polymorphisms - a tool for genetic analysis of the Japanese rice population. Breed. Sci. (2010) In pressGoogle Scholar
- Oka HI. Origin of cultivated rice. Tokyo: Elsevier/Japan Scientific Societies Press; 1988.Google Scholar
- Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. 2007;81:559–75Google Scholar
- Semon M, Nielsen R, Jones MP, McCouch SR. The population structure of African cultivated rice Oryza glaberrima (Steud.): evidence for elevated levels of linkage disequilibrium caused by admixture with O. sativa and ecological adaptation. Genetics. 2005;169:1639–1647.Google Scholar
- Sobrizal K, Ikeda K, Sanchez PL, Doi K, Angeles ER, Khush GS, et al. Development of Oryza glumaepatula introgression lines in rice, O. sativa L. Rice Genetics Newsletter. 1996;16:107.Google Scholar
- Tian F, Li D, Fu Q, Zhu Z, Fu Y, Wang X, et al. Construction of introgression lines carrying wild rice (Oryza rufipogon Griff.) segments in cultivated rice (Oryza sativa L.) background and characterization of introgressed segments associated with yield-related traits. Theor Appl Genet. 2006;112:570–80.PubMedCrossRefGoogle Scholar
- Wright M, Tung C-W, Zhao K, Reynolds A, McCouch SR, Bustamante CD. ALCHEMY: a reliable method for automated SNP genotype calling for small batch sizes and highly homozygous populations. Bioinformatics 2010, doi:10.1093/bioinformatics/btq533.
- Yamamoto T, Yonemaru J-I, Nagasaki H, Ebana K, Nakajima M, Shibaya T, et al. Fine definition of the pedigree haplotypes of closely related rice cultivars by means of genome-wide discovery of single-nucleotide polymorphisms. BMC Genomics. 2010;11:267.Google Scholar
- Zhao K, Wright M, Kimball J, Eizenga G, McClung A, Kovach M, et al. Genomic diversity and introgression in O. sativa reveal the impact of domestication and breeding on the rice genome. PLOS One 2010. PLoS ONE 5(5): e107780. doi:10.1371/journal.pne.0010780.