Abstract
The GC content, one of the important compositional features of the genome, varies significantly among different genomes and different regions within a genome. Identifying the driving force that shaped the GC content and deciphering the biological meaning of variations in the GC content will help us to understand genome evolution. We analyzed and compared the GC contents of 20 selected plant species, representing the major evolutionary lineages. Our result revealed the highest GC content and GC heterogeneity in the grass genomes followed by the non-grass monocot and dicot genomes. The detailed analysis of GC content in genic regions showed higher GC content in terminal exons than in internal exons in all selected species except Volvox carteri. A strong correlation between the GC contents of exons and their neighboring introns at terminals of genes was observed in all the grasses, Musa acuminata, Spirodela polyrhiza and Nelumbo nucifera genomes. Our result suggested that the widely reported negative gradient of GC3 along the coding sequences from 5′ to 3′ was likely an artifact caused by GC content calculations on an admixture of genes with variable lengths and exon numbers. Our findings supported the role of the GC biased gene conversion in shaping the nucleotide composition landscapes in monocots. The U shape pattern of the GC content along the genes may have resulted from variable degrees of interactions among transcription, replication and DNA repair machineries. The transcription-associated recombination might play a major role in GC content evolution.
Similar content being viewed by others
References
Aguilera A, Gaillard H (2014) Transcription and recombination: when RNA meets DNA. Cold Spring Harb Perspect Biol. doi:10.1101/cshperspect.a016543
Al-Dous EK, George B, Al-Mahmoud ME et al (2011) De novo genome sequencing and comparative genomics of date palm (Phoenix dactylifera). Nat Biotechnol 29:521–527. doi:10.1038/nbt.1860
Amborella Genome Project (2013) The Amborella genome and the evolution of flowering plants. Science 342:1241089. doi:10.1126/science.1241089
Banks JA, Nishiyama T, Hasebe M et al (2011) The Selaginella genome identifies genetic changes associated with the evolution of vascular plants. Science 332:960–963. doi:10.1126/science.1203810
Barow M, Meister A (2002) Lack of correlation between AT frequency and genome size in higher plants and the effect of nonrandomness of base sequences on dye binding. Cytometry 47:1–7
Bennetzen JL, Schmutz J, Wang H et al (2012) Reference genome sequence of the model plant Setaria. Nat Biotechnol 30:555–561. doi:10.1038/nbt.2196
Bernardi G, Bernardi G (1986) Compositional constraints and genome evolution. J Mol Evol 24:1–11. doi:10.1007/BF02099946
Brown TC, Jiricny J (1987) A specific mismatch repair event protects mammalian cells from loss of 5-methylcytosine. Cell 50:945–950
Cai J, Liu X, Vanneste K et al (2015) The genome sequence of the orchid Phalaenopsis equestris. Nat Genet 47:65–72. doi:10.1038/ng.3149
Carels N, Bernardi G (2000) Two classes of genes in plants. Genetics 154:1819–1825
Carels N, Hatey P, Jabbari K, Bernardi G (1998) Compositional Properties of Homologous Coding Sequences from Plants. J Mol Evol 46:45–53. doi:10.1007/PL00006282
Castellano-Pozo M, García-Muse T, Aguilera A (2012) R-loops cause replication impairment and genome instability during meiosis. EMBO Rep 13:923–929. doi:10.1038/embor.2012.119
Chodavarapu RK, Feng S, Bernatavichute YV et al (2010) Relationship between nucleosome positioning and DNA methylation. Nature 466:388–392. doi:10.1038/nature09147
Clément Y, Fustier M-A, Nabholz B, Glémin S (2015) The bimodal distribution of Genic GC content is Ancestral to monocot species. Genome Biol Evol 7:336–348. doi:10.1093/gbe/evu278
Costantino L, Koshland D (2015) The Yin and Yang of R-loop biology. Curr Opin Cell Biol 34:39–45. doi:10.1016/j.ceb.2015.04.008
D’Hont A, Denoeud F, Aury J-M et al (2012) The banana (Musa acuminata) genome and the evolution of monocotyledonous plants. Nature 488:213–217. doi:10.1038/nature11241
Fujimori S, Washio T, Tomita M (2005) GC-compositional strand bias around transcription start sites in plants and fungi. BMC Genomics 6:26. doi:10.1186/1471-2164-6-26
Fullerton SM, Carvalho AB, Clark AG (2001) Local rates of recombination are positively correlated with GC content in the human genome. Mol Biol Evol 18:1139–1142
Gautier C (2000) Compositional bias in DNA. Curr Opin Genet Dev 10:656–661
Ginno PA, Lim YW, Lott PL et al (2013) GC skew at the 5′ and 3′ ends of human genes links R-loop formation to epigenetic regulation and transcription termination. Genome Res 23:1590–1600. doi:10.1101/gr.158436.113
Glémin S, Clément Y, David J, Ressayre A (2014) GC content evolution in coding regions of angiosperm genomes: a unifying hypothesis. Trends Genet 30:263–270. doi:10.1016/j.tig.2014.05.002
Goodstein DM, Shu S, Howson R et al (2012) Phytozome: a comparative platform for green plant genomics. Nucleic Acids Res 40:D1178–D1186. doi:10.1093/nar/gkr944
Gottipati P, Cassel TN, Savolainen L, Helleday T (2008) Transcription-associated recombination is dependent on replication in mammalian cells. Mol Cell Biol 28:154–164. doi:10.1128/MCB.00816-07
Guo X, Bao J, Fan L (2007) Evidence of selectively driven codon usage in rice: implications for GC content evolution of Gramineae genes. FEBS Lett 581:1015–1021. doi:10.1016/j.febslet.2007.01.088
Haudry A, Cenci A, Guilhaumon C et al (2008) Mating system and recombination affect molecular evolution in four Triticeae species. Genet Res 90:97–109. doi:10.1017/S0016672307009032
Hellsten U, Wright KM, Jenkins J et al (2013) Fine-scale variation in meiotic recombination in Mimulus inferred from population shotgun sequencing. Proc Natl Acad Sci 110:19478–19482. doi:10.1073/pnas.1319032110
International Brachypodium Initiative (2010) Genome sequencing and analysis of the model grass Brachypodium distachyon. Nature 463:763–768. doi:10.1038/nature08747
Jaillon O, Aury J-M, Noel B et al (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449:463–467. doi:10.1038/nature06148
Jonkers I, Lis JT (2015) Getting up to speed with transcription elongation by RNA polymerase II. Nat Rev Mol Cell Biol 16:167–177. doi:10.1038/nrm3953
King GJ (2002) Through a genome, darkly: comparative analysis of plant chromosomal DNA. Plant Mol Biol 48:5–20
King GJ, Ingrouille MJ (1987) DNA base composition heterogeneity in the grass genus Briza L. Genome 29:621–626. doi:10.1139/g87-103
Lassalle F, Périan S, Bataillon T et al (2015) GC-content evolution in bacterial genomes: the biased gene conversion hypothesis expands. PLoS Genet 11, e1004941. doi:10.1371/journal.pgen.1004941
Mattick JS, Gagen MJ (2001) The evolution of controlled multitasked gene networks: the role of introns and other noncoding RNAs in the development of complex organisms. Mol Biol Evol 18:1611–1630
McLean MA, Tirosh I (2011) Opposite GC skews at the 5′ and 3′ ends of genes in unicellular fungi. BMC Genomics 12:638. doi:10.1186/1471-2164-12-638
Meunier J, Duret L (2004) Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol 21:984–990. doi:10.1093/molbev/msh070
Ming R, Hou S, Feng Y et al (2008) The draft genome of the transgenic tropical fruit tree papaya (Carica papaya Linnaeus). Nature 452:991–996. doi:10.1038/nature06856
Ming R, VanBuren R, Liu Y et al (2013) Genome of the long-living sacred lotus (Nelumbo nucifera Gaertn.). Genome Biol 14:R41. doi:10.1186/gb-2013-14-5-r41
Ming R, VanBuren R, Wai CM et al (2015) The pineapple genome and the evolution of CAM photosynthesis. Nat Genet 47:1435–1442. doi:10.1038/ng.3435
Mugal CF, von Grünberg H-H, Peifer M (2009) Transcription-induced mutational strand bias and its effect on substitution rates in human genes. Mol Biol Evol 26:131–142. doi:10.1093/molbev/msn245
Muyle A, Serres-Giardi L, Ressayre A et al (2011) GC-biased gene conversion and selection affect GC content in the Oryza genus (rice). Mol Biol Evol 28:2695–2706. doi:10.1093/molbev/msr104
Nystedt B, Street NR, Wetterbom A et al (2013) The Norway spruce genome sequence and conifer genome evolution. Nature 497:579–584. doi:10.1038/nature12211
Ossowski S, Schneeberger K, Lucas-Lledó JI et al (2010) The Rate and Molecular Spectrum of Spontaneous Mutations in Arabidopsis thaliana. Science. doi:10.1126/science.1180677
Ouyang S, Zhu W, Hamilton J et al (2007) The TIGR rice genome annotation resource: improvements and new features. Nucleic Acids Res 35:D883–D887. doi:10.1093/nar/gkl976
Paterson AH, Bowers JE, Bruggmann R et al (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457:551–556. doi:10.1038/nature07723
Polak P, Arndt PF (2008) Transcription induces strand-specific mutations at the 5′ end of human genes. Genome Res 18:1216–1223. doi:10.1101/gr.076570.108
Polak P, Querfurth R, Arndt PF (2010) The evolution of transcription-associated biases of mutations across vertebrates. BMC Evol Biol 10:187. doi:10.1186/1471-2148-10-187
Prochnik SE, Umen J, Nedelcu AM et al (2010) Genomic analysis of organismal complexity in the multicellular green alga Volvox carteri. Science 329:223–226. doi:10.1126/science.1188800
R Core Team (2015) R: a language and environment for statistical computing. R Foundation for Statistical Computing, Vienna
Ratnakumar A, Mousset S, Glémin S et al (2010) Detecting positive selection within genomes: the problem of biased gene conversion. Philos Trans R Soc Lond B Biol Sci 365:2571–2580. doi:10.1098/rstb.2010.0007
Rensing SA, Lang D, Zimmer AD et al (2008) The Physcomitrella genome reveals evolutionary insights into the conquest of land by plants. Science 319:64–69. doi:10.1126/science.1150646
Ressayre A, Glémin S, Montalent P et al (2015) introns structure patterns of variation in nucleotide composition in Arabidopsis thaliana and rice protein-coding genes. Genome Biol Evol 7:2913–2928. doi:10.1093/gbe/evv189
Rocha EPC, Danchin A (2002) Base composition bias might result from competition for metabolic resources. Trends Genet 18:291–294
Salinas J, Matassi G, Montero LM, Bernardi G (1988) Compositional compartmentalization and compositional patterns in the nuclear genomes of plants. Nucleic Acids Res 16:4269–4285. doi:10.1093/nar/16.10.4269
Schnable PS, Ware D, Fulton RS et al (2009) The B73 maize genome: complexity, diversity, and dynamics. Science 326:1112–1115. doi:10.1126/science.1178534
Schwartz S, Meshorer E, Ast G (2009) Chromatin organization marks exon-intron structure. Nat Struct Mol Biol 16:990–995. doi:10.1038/nsmb.1659
Serenkov GP (1962) Nucleic acids in the evolution of algae. Izv Akad Nauk SSSR Biol 1962:857–868
Serres-Giardi L, Belkhir K, David J, Glémin S (2012) Patterns and evolution of nucleotide landscapes in seed plants. Plant Cell 24:1379–1397. doi:10.1105/tpc.111.093674
Singh R, Ming R, Yu Q (2013) Nucleotide composition of the Nelumbo nucifera genome. Trop Plant Biol 6:85–97. doi:10.1007/s12042-013-9123-3
Šmarda P, Bureš P (2012) The variation of base composition in plant genomes. In: Wendel JF, Greilhuber J, Dolezel J, Leitch IJ (eds) Plant genome diversity volume 1. Springer, Vienna, pp 209–235
Šmarda P, Bureš P, Horová L et al (2014) Ecological and evolutionary significance of genomic GC content diversity in monocots. Proc Natl Acad Sci U S A 111:E4096–E4102. doi:10.1073/pnas.1321152111
Spencer CCA (2006) Human polymorphism around recombination hotspots. Biochem Soc Trans 34:535–536. doi:10.1042/BST0340535
Swarbreck D, Wilks C, Lamesch P et al (2008) The Arabidopsis Information Resource (TAIR): gene structure and function annotation. Nucleic Acids Res 36:D1009–D1014. doi:10.1093/nar/gkm965
Tatarinova T, Brover V, Troukhan M, Alexandrov N (2003) Skew in CG content near the transcription start site in Arabidopsis thaliana. Bioinf Oxf Engl 19(Suppl 1):i313–i314
Tatarinova TV, Alexandrov NN, Bouck JB, Feldmann KA (2010) GC3 biology in corn, rice, sorghum and other grasses. BMC Genomics 11:308. doi:10.1186/1471-2164-11-308
Thomas BJ, Rothstein R (1989) Elevated recombination rates in transcriptionally active DNA. Cell 56:619–630. doi:10.1016/0092-8674(89)90584-9
Tuskan GA, Difazio S, Jansson S et al (2006) The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:1596–1604. doi:10.1126/science.1128691
Vinogradov AE (2003) DNA helix: the importance of being GC-rich. Nucleic Acids Res 31:1838–1844
Voelkel-Meiman K, Keil RL, Roeder GS (1987) Recombination-stimulating sequences in yeast ribosomal DNA correspond to sequences regulating transcription by RNA polymerase I. Cell 48:1071–1079. doi:10.1016/0092-8674(87)90714-8
Wang H, Singer GAC, Hickey DA (2004) Mutational bias affects protein evolution in flowering plants. Mol Biol Evol 21:90–96. doi:10.1093/molbev/msh003
Wang W, Haberer G, Gundlach H et al (2014) The Spirodela polyrhiza genome reveals insights into its neotenous reduction fast growth and aquatic lifestyle. Nat Commun 5:3311. doi:10.1038/ncomms4311
Weber CC, Boussau B, Romiguier J et al (2014) Evidence for GC-biased gene conversion as a driver of between-lineage differences in avian base composition. Genome Biol 15:549. doi:10.1186/s13059-014-0549-1
Webster MT, Smith NGC, Hultin-Rosenberg L et al (2005) Male-driven biased gene conversion governs the evolution of base composition in human alu repeats. Mol Biol Evol 22:1468–1474. doi:10.1093/molbev/msi136
Webster MT, Axelsson E, Ellegren H (2006) Strong regional biases in nucleotide substitution in the chicken genome. Mol Biol Evol 23:1203–1216. doi:10.1093/molbev/msk008
Wong GK-S, Wang J, Tao L et al (2002) Compositional gradients in Gramineae genes. Genome Res 12:851–856. doi:10.1101/gr.189102
Zhu L, Zhang Y, Zhang W et al (2009) Patterns of exon-intron architecture variation of genes in eukaryotic genomes. BMC Genomics 10:47. doi:10.1186/1471-2164-10-47
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Paulo Arruda
Electronic supplementary material
Below is the link to the electronic supplementary material.
Sup. Fig. 1
Variation of GC3 content from the 5′ end to the 3′ end in (a) GC poor (GC < 60 %) and (b) GC rich (GC ≥ 60) coding sequences of the 20 selected species. The GC content of grasses and dicots were averaged and represented as “Grasses_avg” and “Dicot_avg”, respectively. The error bars represent the standard deviation of GC contents among the members of grasses and dicots. (PDF 1014 kb) (PDF 1014 kb)
Sup. Fig. 2–21
Box plots of GC contents of each exon in the subset of genes grouped based on the number of exons. The genes with same number of exons were grouped in one group and box plot was drawn for each subset individually. The first plot for each species was drawn on the admixture of all the genes within the species. Within each set genes were further divided into GC rich (red) and GC poor (blue). Red boxes are missing in some plots because the GC rich genes with that exon number are not found. The exon index is presented on X-axis and the GC content is presented on Y-axis. Sup. Fig. 2–21 represent plant species in following order: P. trichocarpa; A. thaliana; C. papaya; V. vinifera; N. nucifera; S. polyrhiza; P. equestris; P. dactylifera; M. acuminata; A. comosus; S. bicolor; Z. mays; S. italica; O. sativa; B. distachyon; A. trichopoda; P. abies; S. moellendorffii; P. patens; V. carteri. (PDF 5278 kb)
Sup. Fig. 22
Matrix plot of correlations of GC contents between indexed intron and exon pairs. The exon index is presented on x-axis and intron index is on y-axis. Each circle in the plot represents the correlation of GC content between the intron and the exon at the assigned index. The size of each circle in the matrix plot corresponds to the magnitude of correlation and colors represent the direction of correlation. Green (r < 0.4) and red (r ≥ 0.4) colors indicate positive correlation while yellow(r < −0.4) and purple (r ≥ −0.4) represent negative correlation. (PDF 3342 kb) (PDF 3342 kb)
Sup. Fig. 23
Matrix plot of correlations of GC contents between indexed intron and exon pairs in a subset of genes with 15 exons. The exon index is presented on x-axis and intron index is on y-axis. Each circle in the plot represents the correlation of GC content between the intron and the exon at the assigned index. The size of each circle in the matrix plot corresponds to the magnitude of correlation and colors represent the direction of correlation. Green (r < 0.4) and red (r ≥ 0.4) colors indicate positive correlation while yellow(r < −0.4) and purple (r ≥ −0.4) represent negative correlation. (PDF 2436 kb) (PDF 2436 kb)
Sup. Fig. 24
Scatterplots of intron GC content on y-axis and exon GC content on x-axis for all the 20 selected genomes. The genes >5000 nt were represented in shades of red and smaller genes in shades of blue. The density of the colors corresponds to the number of genes plotted in the area. Pearson’s correlation coefficients (r) between the GC contents for large and small genes can be found below each window. (PDF 5832 kb) (PDF 5832 kb)
Sup. Fig. 25
Scatterplot of cumulative length of introns in a gene on y-axis and average GC content of exons in the corresponding gene on x-axis. The genes containing 10 or more introns were represented in shades of red and genes with introns less than 10 in shades of blue. The density of the colors corresponds to the number of genes plotted in the area. Pearson’s correlation coefficients (r) between the intron length and exon GC content can be found below each window. (PDF 5748 kb) (PDF 5748 kb)
Rights and permissions
About this article
Cite this article
Singh, R., Ming, R. & Yu, Q. Comparative Analysis of GC Content Variations in Plant Genomes. Tropical Plant Biol. 9, 136–149 (2016). https://doi.org/10.1007/s12042-016-9165-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12042-016-9165-4