Abstract
Nucleotide composition of genomes varies among organisms and among genes within a genome. Knowing the variations of nucleotide composition and the consequences of these differences would provide fundamental information for understanding genome evolution. Lotus (Nelumbo nucifera Gaertn.), a basal eudicot, diverged from its closest sister lineage at 135–125 MYA. The unique phylogenetic position of lotus makes it important for evolutionary studies. With the draft genome sequence of lotus available, we analyzed the nucleotide composition of the lotus genome compared to other sequenced plant genomes. GC content of the lotus genome was between grasses and eudicots. The lotus genome had more coding sequences with high GC3 than did the core eudicots, and its GC3 composition exhibited a negative gradient along the transcription direction. The difference between purines and pyrimidines in the coding sequence of lotus was smallest among the selected eudicot species. The lotus genome had a lower observed/expected (O/E) ratio of CpG dinucleotide than did the selected dicots and monocots except for grape. The lower O/E ratio of CpG dinucleotide is likely caused by instability of transposable elements, rather than a higher mutation rate.
Similar content being viewed by others
References
Akashi H (1994) Synonymous codon usage in Drosophila melanogaster: natural selection and translational accuracy. Genetics 136:927–935
Akashi H (1996) Molecular evolution between drosophila melanogaster and D. simulans reduced codon bias, faster rates of amino acid substitution, and larger proteins in D. melanogaster. Genetics 144:1297–1307
Ashikawa I (2001) Gene-associated CpG islands in plants as revealed by analyses of genomic sequences. Plant J 26:617–625
Banks JA, Nishiyama T, Hasebe M et al (2011) The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science 332:960–963. doi:10.1126/science.1203810
Barow M, Meister A (2002) Lack of correlation between AT frequency and genome size in higher plants and the effect of nonrandomness of base sequences on dye binding. Cytometry 47:1–7
Baudat F, Nicolas A (1997) Clustering of meiotic double-strand breaks on yeast chromosome III. PNAS 94:5213–5218
Bell SJ, Chow YC, Ho JY, Forsdyke DR (1998) Correlation of chi orientation with transcription indicates a fundamental relationship between recombination and transcription. Gene 216:285–292
Benson DA, Karsch-Mizrachi I, Lipman DJ et al (2008) GenBank. Nucleic Acids Res 36:D25–D30. doi:10.1093/nar/gkm929
Bernardi G, Bernardi G (1986) Compositional constraints and genome evolution. J Mol Evol 24:1–11. doi:10.1007/BF02099946
Bird AP (1980) DNA methylation and the frequency of CpG in animal DNA. Nucleic Acids Res 8:1499–1504
Bird AP (1986) CpG-rich islands and the function of DNA methylation. Nature 321:209–213. doi:10.1038/321209a0
Bird AP (1987) CpG islands as gene markers in the vertebrate nucleus. Trends Genet 3:342–347. doi:10.1016/0168-9525(87)90294-0
Birdsell JA (2002) Integrating genomics, bioinformatics, and classical genetics to study the effects of recombination on genome evolution. Mol Biol Evol 19:1181–1197
Blackledge NP, Zhou JC, Tolstorukov MY et al (2010) CpG islands recruit a histone H3 lysine 36 demethylase. Mol Cell 38:179–190. doi:10.1016/j.molcel.2010.04.009
Bulmer M (1991) The selection-mutation-drift theory of synonymous codon usage. Genetics 129:897–907
Camiolo S, Farina L, Porceddu A (2012) The relation of codon bias to tissue-specific gene expression in Arabidopsis thaliana. Genetics. doi:10.1534/genetics.112.143677
Carels N, Hatey P, Jabbari K, Bernardi G (1998) Compositional properties of homologous coding sequences from plants. J Mol Evol 46:45–53. doi:10.1007/PL00006282
Chen L, Zhao H (2005) Negative correlation between compositional symmetries and local recombination rates. Bioinformatics 21:3951–3958. doi:10.1093/bioinformatics/bti651
Dang KD, Dutt PB, Forsdyke DR (1998) Chargaff difference analysis of the bithorax complex of Drosophila melanogaster. Biochem Cell Biol 76:129–137
Detloff P, White MA, Petes TD (1992) Analysis of a gene conversion gradient at the HIS4 locus in Saccharomyces cerevisiae. Genetics 132:113–123
Duret L (2002) Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev 12:640–649
Duret L, Mouchiroud D (1999) Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. PNAS 96:4482–4487. doi:10.1073/pnas.96.8.4482
Eyre-Walker A, Hurst LD (2001) The evolution of isochores. Nat Rev Genet 2:549–555. doi:10.1038/35080577
Fujimori S, Washio T, Tomita M (2005) GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics 6:26. doi:10.1186/1471-2164-6-26
Fullerton SM, Carvalho AB, Clark AG (2001) Local rates of recombination are positively correlated with GC content in the human genome. Mol Biol Evol 18:1139–1142
Gardiner-Garden M, Frommer M (1987) CpG islands in vertebrate genomes. J Mol Biol 196:261–282
Gautier C (2000) Compositional bias in DNA. Curr Opin Genet Dev 10:656–661
Goodstein DM, Shu S, Howson R et al (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:D1178–D1186. doi:10.1093/nar/gkr944
Gouy M, Gautier C (1982) Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res 10:7055–7074
Grantham R, Gautier C, Gouy M et al (1980) Codon catalog usage and the genome hypothesis. Nucleic Acids Res 8:r49–r62
Guo X, Bao J, Fan L (2007) Evidence of selectively driven codon usage in rice: implications for GC content evolution of Gramineae genes. FEBS Lett 581:1015–1021. doi:10.1016/j.febslet.2007.01.088
Haudry A, Cenci A, Guilhaumon C et al (2008) Mating system and recombination affect molecular evolution in four Triticeae species. Genet Res (Camb) 90:97–109. doi:10.1017/S0016672307009032
Heger A, Ponting CP (2007) Variable strength of translational selection among 12 drosophila species. Genetics 177:1337–1348. doi:10.1534/genetics.107.070466
Hershberg R, Petrov DA (2008) Selection on codon bias. Annu Rev Genet 42:287–299. doi:10.1146/annurev.genet.42.110807.091442
Ikemura T (1985) Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol 2:13–34
Iriarte A, Sanguinetti M, Fernández-Calero T et al (2012) Translational selection on codon usage in the genus Aspergillus. Gene 506:98–105. doi:10.1016/j.gene.2012.06.027
Jaillon O, Aury J-M, Noel B et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467. doi:10.1038/nature06148
Jansson S, Meyer-Gauen G, Cerff R, Martin W (1994) Nucleotide distribution in gymnosperm nuclear sequences suggests a model for GC-content change in land-plant nuclear genomes. J Mol Evol 39:34–46
Kawabe A, Miyashita NT (2003) Patterns of codon usage bias in three dicot and four monocot plant species. Genes Genet Syst 78:343–352
King GJ (2002) Through a genome, darkly: comparative analysis of plant chromosomal DNA. Plant Mol Biol 48:5–20
King GJ, Ingrouille MJ (1987) DNA base composition heterogeneity in the grass genus Briza L. Genome 29:621–626. doi:10.1139/g87-103
Kudla G, Lipinski L, Caffin F et al (2006) High guanine and cytosine content increases mRNA levels in mammalian cells. PLoS Biol 4:e180. doi:10.1371/journal.pbio.0040180
Kuhl JC, Cheung F, Yuan Q et al (2004) A unique set of 11,008 onion expressed sequence tags reveals expressed sequence and genomic differences between the monocot orders asparagales and poales. Plant Cell 16:114–125. doi:10.1105/tpc.017202
Lao PJ, Forsdyke DR (2000) Thermophilic bacteria strictly obey Szybalski’s transcription direction rule and politely purine-load RNAs with both adenine and guanine. Genome Res 10:228–236
Lescot M, Piffanelli P, Ciampi AY et al (2008) Insights into the Musa genome: syntenic relationships to rice and between Musa species. BMC Genomics 9:58. doi:10.1186/1471-2164-9-58
Marais G (2003) Biased gene conversion: implications for genome and sex evolution. Trends Genet 19:330–338. doi:10.1016/S0168-9525(03)00116-1
Marais G, Duret L (2001) Synonymous codon usage, accuracy of translation, and gene length in Caenorhabditis elegans. J Mol Evol 52:275–280
Matassi G, Montero LM, Salinas J, Bernardi G (1989) The isochore organization and the compositional distribution of homologous coding sequences in the nuclear genome of plants. Nucleic Acids Res 17:5273–5290
McLean MA, Tirosh I (2011) Opposite GC skews at the 5′ and 3′ ends of genes in unicellular fungi. BMC Genomics 12:638. doi:10.1186/1471-2164-12-638
Merchant SS, Prochnik SE, Vallon O et al (2007) The Chlamydomonas genome reveals the evolution of key animal and plant functions. Science 318:245–250. doi:10.1126/science.1143609
Meunier J, Duret L (2004) Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol 21:984–990. doi:10.1093/molbev/msh070
Ming R, Hou S, Feng Y et al (2008) The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452:991–996. doi:10.1038/nature06856
Ming R, Vanburen R, Liu Y et al (2013) Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol 14:R41
Montero LM, Salinas J, Matassi G, Bernardi G (1990) Gene distribution and isochore organization in the nuclear genome of plants. Nucleic Acids Res 18:1859–1867
Moore MJ, Soltis PS, Bell CD et al (2010) Phylogenetic analysis of 83 plastid genes further resolves the early diversification of eudicots. PNAS 107:4623–4628. doi:10.1073/pnas.0907801107
Mouchiroud D, D’Onofrio G, Aïssani B et al (1991) The distribution of genes in the human genome. Gene 100:181–187
Mugal CF, von Grünberg H-H, Peifer M (2009) Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol 26:131–142. doi:10.1093/molbev/msn245
Mukhopadhyay P, Basak S, Ghosh TC (2007) Nature of selective constraints on synonymous codon usage of rice differs in GC-poor and GC-rich genes. Gene 400:71–81. doi:10.1016/j.gene.2007.05.027
Muyle A, Serres-Giardi L, Ressayre A et al (2011) GC-biased gene conversion and selection affect GC content in the Oryza Genus (rice). Mol Biol Evol 28:2695–2706. doi:10.1093/molbev/msr104
Nakamura Y, Gojobori T, Ikemura T (2000) Codon usage tabulated from International DNA sequence databases: status for the year 2000. Nucl Acids Res 28:292. doi:10.1093/nar/28.1.292
Novoa EM, Pavon-Eternod M, Pan T, Ribas de Pouplana L (2012) A role for tRNA modifications in genome structure and codon usage. Cell 149:202–213. doi:10.1016/j.cell.2012.01.050
Ouyang S, Zhu W, Hamilton J et al (2007) The TIGR rice genome annotation resource: improvements and new features. Nucleic Acids Res 35:D883–D887. doi:10.1093/nar/gkl976
Paterson AH, Bowers JE, Bruggmann R et al (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457:551–556. doi:10.1038/nature07723
Pessia E, Popa A, Mousset S et al (2012) Evidence for widespread GC-biased gene conversion in eukaryotes. Genome Biol Evol 4:675–682. doi:10.1093/gbe/evs052
Plotkin JB, Kudla G (2010) Synonymous but not the same: the causes and consequences of codon bias. Nat Rev Genet 12:32–42. doi:10.1038/nrg2899
Polak P, Arndt PF (2008) Transcription induces strand-specific mutations at the 5′ end of human genes. Genome Res 18:1216–1223. doi:10.1101/gr.076570.108
Polak P, Querfurth R, Arndt PF (2010) The evolution of transcription-associated biases of mutations across vertebrates. BMC Evol Biol 10:187. doi:10.1186/1471-2148-10-187
Prochnik SE, Umen J, Nedelcu AM et al (2010) Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science 329:223–226. doi:10.1126/science.1188800
Rensing SA, Lang D, Zimmer AD et al (2008) The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319:64–69. doi:10.1126/science.1150646
Rocha EPC, Danchin A (2002) Base composition bias might result from competition for metabolic resources. Trends Genet 18:291–294
Salinas J, Matassi G, Montero LM, Bernardi G (1988) Compositional compartmentalization and compositional patterns in the nuclear genomes of plants. Nucl Acids Res 16:4269–4285. doi:10.1093/nar/16.10.4269
Saxonov S, Berg P, Brutlag DL (2006) A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci USA 103:1412–1417. doi:10.1073/pnas.0510310103
Schmutz J, Cannon SB, Schlueter J et al (2010) Genome sequence of the palaeopolyploid soybean. Nature 463:178–183. doi:10.1038/nature08670
Schnable PS, Ware D, Fulton RS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115. doi:10.1126/science.1178534
Sémon M, Mouchiroud D, Duret L (2005) Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Hum Mol Genet 14:421–427. doi:10.1093/hmg/ddi038
Serres-Giardi L, Belkhir K, David J, Glémin S (2012) Patterns and evolution of nucleotide landscapes in seed plants. Plant Cell 24:1379–1397. doi:10.1105/tpc.111.093674
Sharp PM, Cowe E, Higgins DG et al (1988) Codon usage patterns in Escherichia coli, Bacillus subtilis, Saccharomyces cerevisiae, Schizosaccharomyces pombe, Drosophila melanogaster and Homo sapiens; a review of the considerable within-species diversity. Nucleic Acids Res 16:8207–8211
Šmarda P, Bureš P (2012) The variation of base composition in plant genomes. In: Wendel JF, Greilhuber J, Dolezel J, Leitch IJ (eds) Plant genome diversity, vol 1, Plant genomes, their residents, and their evolutionary dynamics. Springer, New York, pp 209–235
Stoletzki N (2011) The surprising negative correlation of gene length and optimal codon use--disentangling translational selection from GC-biased gene conversion in yeast. BMC Evol Biol 11:93. doi:10.1186/1471-2148-11-93
Swarbreck D, Wilks C, Lamesch P et al (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36:D1009–D1014. doi:10.1093/nar/gkm965
Tatarinova T, Brover V, Troukhan M, Alexandrov N (2003) Skew in CG content near the transcription start site in Arabidopsis thaliana. Bioinformatics 19(Suppl 1):i313–i314
Tuskan GA, Difazio S, Jansson S et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. doi:10.1126/science.1128691
Velasco R, Zharkikh A, Affourtit J et al (2010) The genome of the domesticated apple (Malus × domestica Borkh.). Nat Genet 42:833–839. doi:10.1038/ng.654
Vinogradov AE (2004) Compactness of human housekeeping genes: selection for economy or genomic design? Trends Genet 20:248–253. doi:10.1016/j.tig.2004.03.006
Vinogradov AE (2003) DNA helix: the importance of being GC-rich. Nucleic Acids Res 31:1838–1844
Vinogradov AE (2005) Dualism of gene GC content and CpG pattern in regard to expression in the human genome: magnitude versus breadth. Trends Genet 21:639–643. doi:10.1016/j.tig.2005.09.002
Vogel J, Garvin D, Mockler T et al (2010) Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463:763–768. doi:10.1038/nature08747
Wang H, Singer GAC, Hickey DA (2004) Mutational bias affects protein evolution in flowering plants. Mol Biol Evol 21:90–96. doi:10.1093/molbev/msh003
Wang L, Roossinck MJ (2006) Comparative analysis of expressed sequences reveals a conserved pattern of optimal codon usage in plants. Plant Mol Biol 61:699–710. doi:10.1007/s11103-006-0041-8
Wang Z, Hobson N, Galindo L et al (2012) The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J 72:461–473. doi:10.1111/j.1365-313X.2012.05093.x
Wikström N, Savolainen V, Chase MW (2001) Evolution of the angiosperms: calibrating the family tree. Proc R Soc Lond B 268:2211–2220. doi:10.1098/rspb.2001.1782
Yada T, Yoshida K, Morita M et al (2011) Linear regression models predicting strength of transcriptional activity of promoters. Genome Inform 25:53–60
Yamashita R, Suzuki Y, Sugano S, Nakai K (2005) Genome-wide analysis reveals strong correlation between CpG islands with nearby transcription start sites of genes and their tissue specificity. Gene 350:129–136. doi:10.1016/j.gene.2005.01.012
Acknowledgements
This work was supported by Texas A&M AgriLife Research.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Paul Moore
Electronic supplementary materials
Below is the link to the electronic supplementary material.
Supplemental Fig. S1
(DOCX 55 kb)
Rights and permissions
About this article
Cite this article
Singh, R., Ming, R. & Yu, Q. Nucleotide Composition of the Nelumbo nucifera Genome. Tropical Plant Biol. 6, 85–97 (2013). https://doi.org/10.1007/s12042-013-9123-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12042-013-9123-3