Comparative Analysis of Rice Genome Sequence to Understand the Molecular Basis of Genome Evolution
- First Online:
- Cite this article as:
- Wu, J., Mizuno, H., Sasaki, T. et al. Rice (2008) 1: 119. doi:10.1007/s12284-008-9021-8
- 418 Downloads
Accurate sequencing of the rice genome has ignited a passion for elucidating mechanism for sequence diversity among rice varieties and species, both in protein-coding regions and in genomic regions that are important for chromosome functions. Here, we have shown examples of sequence diversity in genic and non-genic regions. Sequence analysis of chromosome ends has revealed that there is diversity in both sequences and distribution in the region of telomere repeat arrays, from chromosome to chromosome, within a plant. Detailed study has allowed us to speculate the mechanism of generation of these arrays. Sequence analysis using various cultivated and wild rice of the sd1 gene, which contributed to the “Green Revolution” in rice varieties and their wild progenitors, has also demonstrated sequence diversity, which is correlated with taxonomic classification. These results indicate that detailed analysis of sequence diversity and comparison might give us a clue in elucidating mechanism of the evolution of rice genome.
KeywordsOryzaSequence diversityTelomere repeatsComparative genomics
Rice is one of the three mega-crops (rice, maize, and wheat) on which more than half the world’s population relies as major sources of calories and protein. Genetic improvement based on molecular biotechnology requires accurate genome sequence, which contributes to the establishment of genome-wide DNA markers for tagging and delimitation of the genetic regions in which genes and quantitative traits locus (QTLs) are located. Global identification of protein-coding genes in the rice genome could enhance the discovery of the genes that are responsible for agronomically desirable traits.
The International Rice Genome Sequencing Project (IRGSP, 1997–2004), which was run by a collaborative research consortium of ten countries, succeeded in establishing the nucleotide sequence of the Nipponbare cultivar of rice’s japonica ssp. to a high standard . The 370 million nucleotides from the 12 chromosomes of rice have now been widely utilized as molecular coordinates for investigating rice genomics and genetics. Although most of the euchromatic regions (95%) of the rice genome were covered by the published sequences, 62 gaps and heterochromatic regions, centromeres, and telomeres, corresponding to 5% of the total genome, remained unrevealed. To acquire the sequences of these missing regions and thus improve the public rice genome sequence, new genomic libraries recently constructed from physically fragmented genomic DNA have been utilized, and these efforts have succeeded in revealing some of the junction sequences between the euchromatic and heterochromatic regions of rice telomeres. This information, which we will describe here, should give us clues to understanding the molecular diversity of, and mechanisms responsible for, the generation of telomere structures.
The genus Oryza comprises 23 species , but one of the mysteries of rice history is that most of the modern varieties of rice, derived from Oryza sativa and Oryza glaberrima, are the descendants of only a specific lineage (the AA genomes). Oryza emerged about 20 to 22 Mya . There are geographic, physiological, and genetic diversities among Oryza species, including among rice varieties, landraces, and wild accessions. In recognition of the fact that this variation is indispensable for maintaining the vast genetic resources that should help in developing a sustainable future for the human race, rice is collected, evaluated, and stored in either national or international gene banks (e.g., NIAS Genebank, http://www.gene.affrc.go.jp/about_en.php; ; Japanese National Bioresource Project, http://www.shigen.nig.ac.jp/rice/oryzabase/top/top.jsp; International Rice Research Institute; Genetic Resource Center, http://www.irri.org/GRC/GRChome/Home.htm). Global genome and information resources for the investigation of genome evolution among Oryza species have been facilitated by The Oryza Map Alignment Project (OMAP, http://www.omap.org/; ).
Now that we are armed with these resources, it should be interesting to know how each gene or genomic region has evolved in the course of rice evolution and domestication. The elucidation of molecular diversity, as revealed by detailed sequence analysis, should be a fundamental product of such research.
Here, we present a review of comparative genomics based on information on the sequences of the genic region. A detailed molecular diversity analysis of both the exon and intron regions within the “Green Revolution” gene would not only present information on protein diversity but also give us clues to genomic conservation and development.
Diversity of the telomere region among rice chromosomes
Although IRGSP attempted clone-by-clone genomic sequencing to cover the whole genome, clone gaps remained in the chromosomal ends. As the restriction enzymes used in the construction of PAC/BAC libraries could not cut the canonical telomere array, (TTTAGGG)n, the libraries did not contain the clones derived from telomeric sequences [5, 39]. To capture the telomere sequences, a rice fosmid library constructed by the cloning of random mechanically sheared DNA  was screened . The library enabled telomeric sequences to be obtained without the constraints imposed by enzyme site preferences. We describe here the characteristics of the telomeric regions on the basis of their sequence and length diversity among chromosomes.
The rice chromosomal end has tandemly repeated blocks of the sequence 5′-TTTAGGG-3′ . These telomeric repeats are organized in the order of 5′-TTTAGGG-3′ from the chromosome-specific region [24, 42]. The seven-nucleotide unit has deletions, insertions, or substitutions of single nucleotides near the junction between the telomere and the chromosome-specific region. The rate of accumulation of telomeric variants is higher in the proximal region than in the distal region , suggesting that the proximal region has rarely been reconstructed by telomerase on an evolutionary time scale.
The telomere lengths vary among various accessions of rice. The telomeres of 31 rice accessions (both cultivars and wild species, which belong to AA, BB, BBCC, CC, CCDD, GG, or HHJJ species of Oryza) are 5 to 20 kb in length . Marked variation in telomere length is also observed among cultivated rice of the AA genome: the japonica cultivar Nipponbare shows a relatively low MW pattern and the indica cultivar Kasalath shows a relatively high MW pattern. Moreover, variation in telomere length is observed among chromosomes in Nipponbare. Use of the fiber–fluorescent in situ hybridization (FISH) method has revealed the diversity of telomere length of each chromosome. Seven telomeres in Nipponbare range from 5.1 to 10.8 kb in length, corresponding to about 730 to 1,500 copies of the TTTAGGG telomeric repeat. The chromosome-dependent variation might be a consequence of genetic or epigenetic differences among the sequences of subtelomeres; these differences might affect the balance between telomere shortening and telomere elongation.
Telomere length in various plants has been reported: 2.5 kb in Arabidopsis thaliana ; 4.5 kb at most in Melandrium album ; 60 to 160 kb (in most cases 90 to 130 kb) in Nicotiana tabacum ; and 1.8 to 40.0 kb in maize . Does telomere length change in different cells? In barley (Hordeum vulgare), wide variation in telomere length is observed during the differentiation or ageing of cells. The cells that develop in long-term callus cultures have very long telomeres . Thus, it is possible that telomere length in rice varies with different tissue or developmental stages.
The rice telomere has diversity in both sequence and length. The mosaics of blocks of telomere variants might have resulted from slips during DNA synthesis, a high frequency of DNA recombination, or rapid deletion in the telomeric region, suggesting that the areas near the distal chromosome ends are dynamic and variable.
Diversity analysis of rice functional genes
Growing in a wide range of environments, the genus Oryza contains 23 species; rich in genomic diversity, they could serve not only as potential genetic resources for improvement of rice production but also as good research materials for studies of the evolutionary history and functionality of genes related to speciation, domestication, polyploidy, ecological adaptation, and human selection of rice . The public rice genome sequences obtained from two rice cultivars, Nipponbare (by the IRGSP) and 93-11 (by the Beijing Genomics Institute, BGI), as well as the wild rice BAC library resources established from the AA to HHKK genomes of Oryza species at Arizona Genomics Institute (AGI) provide a good opportunity to carry out such studies [46, 10, 2]. For example, analyses of BAC end sequences and preliminary generation of BAC contigs by using the above libraries have been conducted. These studies suggested that repeat sequences play a role in genome size evolution and found the physical evidence of changes in genomic composition and structure between the different genomes of Oryza species . Materials on all BAC libraries and information on BAC end sequences and BAC contigs are available through the AGI BAC/EST Resource Center (http://www.genome.arizona.edu/orders).
Belonging to the Oryza genus, Oryza sativa, also called Asian cultivated rice, is thought to have originated from the Asian wild rice Oryza rufipogon only about 10,000 years ago . Growing now throughout the world, Oryza sativa has two subspecies, indica and japonica. Knowledge of the differences in phenotype variations among rice species or subspecies at the level of molecular biology would widen future rice breeding possibilities. With this purpose, the Rice Genome Research Program (RGP) has constructed nine novel BAC libraries from species that carry the AA genome, as an important resource for comparative analysis of rice genomes. These include the three Asian rice varieties Kasalath (indica), Shuusoushu (indica), and Kha Mac Kho (japonica) from O. sativa, one accession from the African cultivated rice O. glaberrima, and one accession from each of the wild rice species Oryza rufipogon, Oryza barthii, Oryza glumaepatula, Oryza meridionalis, and Oryza longistaminata . By chromosomal in silico mapping of 78,427 high-quality BAC end sequences, 450 Kasalath BAC contigs that consisted of 12,170 clones and covered 308.5 Mbp of the genome were generated . These resources are freely accessible through the RGP homepage (BAC end sequences at http://rgp.dna.affrc.go.jp/blast/runblast.html, BAC contigs at http://rgp.dna.affrc.go.jp/E/publicdata/kasalathendmap/index.html) for researchers to perform comparative analyses of the genomes of the two subspecies of O. sativa and to generate single nucleotide polymorphism (SNP) or indel markers for genetic studies.
Both basic and applied research on rice genes has been carried out in the past decade, and especially after the completion of sequencing of the two rice genomes (Nipponbare and 93-11), genomic and genetic analyses have greatly increased our understanding of the function of the rice genome. Among the most important achievements are the current use of advanced QTL mapping and genomic sequencing techniques for successful cloning and functional analysis of the rice genes controlling agriculturally important traits. For example, the structure and function of the genes involved in spikelet shattering, grain number, grain shape (width and length), photoperiod sensitivity, tillering, and plant architecture have been reported [3, 7, 19, 21, 22, 29, 33, 36, 43]. It will be both scientifically interesting and agriculturally important to investigate the sequence diversity of these genes among different varieties and species; this information could not only provide valuable information on evolutionary history of a crop but also lead to the discovery of new alleles for the improvement of rice breeding.
To date, there are a few genes whose sequences within the different Oryza species have been extensively investigated and compared to elucidate molecular and evolutionary mechanisms. The genes most analyzed for sequence comparisons in Oryza are probably the alcohol dehydrogenases (Adh). Ge et al.  were the first to sequence two genes (Adh1 and Adh2) from 31 accessions representing all 23 rice species; they reported the phylogenetic relationships among the different Oryza species that are determined from the sequence polymorphisms. Yoshida et al. [44, 45] have investigated the nucleotide diversity in the Adh1 and Adh2 gene regions of O. rufipogon in order to clarify the mechanisms by which DNA variation is maintained.
Summary of sequence comparison in the entire region of sd1 gene among Oryza species using Nipponbare as a reference
Species/subspecies variety/accession genome
Entire region (bp)
Peptide (a. a.)
O. sativa ssp. japonica
Khau Mac Kho
O. sativa ssp. Indica
The sd1 gene was first identified in the Chinese variety Dee-geo-woo-gen (DGWG) and was crossed at the International Rice Research Institute (IRRI) in the early 1960s with Peta (tall) to develop the semidwarf cultivar IR8 . Genetic and molecular analyses have demonstrated that the sd1 gene in DGWG contains a 383-bp deletion spanning parts of the first and second exon and resulting in a frameshift that gives a stop codon within the coding sequence [26, 29]. A similar deletion (280-bp) was detected in the semidwarf indica rice cultivar Doongara . Additional alleles that carry a single mutation causing changes in the amino acid residues in the semidwarf japonica rice cultivars Jikkoku (in exon 1), Calrose76 (in exon 2), and Reimei (in exon 3) have also been found [29, 34]. Interestingly, two accessions of wild rice O. rufipogon (W1944 and W1718) are reported to carry the DGWG allele, suggesting the preservation and human use of natural alleles from the wild progenitor . However, our examination of the sd1 gene sequence within the 17 cultivated and wild rice species revealed none of these types of alleles. The rice cultivar 93-11 seems to encode a truncated protein owing to the presence of a premature stop codon; this codon could, however, be considered as a null allele, because 93-11 does not have a semidwarf phenotype. We also surveyed the presence of alleles as reported above within 60 accessions of O. sativa and 34 accessions of O. rufipogon by using the world core collections from the National Institute of Agrobiological Sciences, National Institute of Genetics, and IRRI. Along with the two modern indica cultivars IR58 and Milyang 23, another rice cultivar, Rexmont, from the USA, contains the DGWG type of allele. No other varieties within the above collections carry any of the remaining types of known alleles.
Estimated nucleotide diversity within the sd1 gene region of O. sativa and its divergence with other Oryza species in comparison with Adh1 gene region
No of sites
Entire genic region
Genome sequences from many plant species have been published, and more than 150 projects aiming to sequence plant genomes have been either completed or ongoing (Genomes OnLine Database v2.0, http://genomesonline.org/). Only the rice and Arabidopsis genomes have been sequenced completely. As the conversion of draft sequences to “finished” ones takes huge amounts of time, effort, and funding, these two plants will serve as reference genomes for the study of monocot and dicot plants, respectively. The emerging ultra-high-throughput sequencing technology will enable us to obtain whole-genome information, which will be mapped and compared with these references, in less time. Studies of genome sequences within and among Oryza species will produce a concrete database for comparative genomics. We will be able to use this database to investigate both the evolution and function of regions, genes, motifs, and sequences within the genome.
We thank all the members of the Rice Genome Research Program for joining our research and discussion. We also thank Dr. N. Kurata (National Institute of Genetics), Drs. D. A. Vaughan, K. Ebana, and T. Izawa (National Institute of Agrobiological Resource Sciences), and Dr. R. A. Wing (the Arizona Genomics Institute) for providing the plant materials and BAC libraries used in this study. This work was supported by a grant from the Ministry of Agriculture, Forestry, and Fisheries of Japan (Green Technology Project GD-2007).