Tropical Plant Biology

, Volume 9, Issue 3, pp 136–149

Comparative Analysis of GC Content Variations in Plant Genomes

Article

DOI: 10.1007/s12042-016-9165-4

Cite this article as:
Singh, R., Ming, R. & Yu, Q. Tropical Plant Biol. (2016) 9: 136. doi:10.1007/s12042-016-9165-4

Abstract

The GC content, one of the important compositional features of the genome, varies significantly among different genomes and different regions within a genome. Identifying the driving force that shaped the GC content and deciphering the biological meaning of variations in the GC content will help us to understand genome evolution. We analyzed and compared the GC contents of 20 selected plant species, representing the major evolutionary lineages. Our result revealed the highest GC content and GC heterogeneity in the grass genomes followed by the non-grass monocot and dicot genomes. The detailed analysis of GC content in genic regions showed higher GC content in terminal exons than in internal exons in all selected species except Volvox carteri. A strong correlation between the GC contents of exons and their neighboring introns at terminals of genes was observed in all the grasses, Musa acuminata, Spirodela polyrhiza and Nelumbo nucifera genomes. Our result suggested that the widely reported negative gradient of GC3 along the coding sequences from 5′ to 3′ was likely an artifact caused by GC content calculations on an admixture of genes with variable lengths and exon numbers. Our findings supported the role of the GC biased gene conversion in shaping the nucleotide composition landscapes in monocots. The U shape pattern of the GC content along the genes may have resulted from variable degrees of interactions among transcription, replication and DNA repair machineries. The transcription-associated recombination might play a major role in GC content evolution.

Keywords

GC content Ananas comosus GC biased gene conversion Transcription associated recombination 

Supplementary material

12042_2016_9165_MOESM1_ESM.pdf (1015 kb)
Sup. Fig. 1Variation of GC3 content from the 5′ end to the 3′ end in (a) GC poor (GC < 60 %) and (b) GC rich (GC ≥ 60) coding sequences of the 20 selected species. The GC content of grasses and dicots were averaged and represented as “Grasses_avg” and “Dicot_avg”, respectively. The error bars represent the standard deviation of GC contents among the members of grasses and dicots. (PDF 1014 kb) (PDF 1014 kb)
12042_2016_9165_MOESM2_ESM.pdf (967 kb)
(PDF 966 kb)
12042_2016_9165_MOESM3_ESM.pdf (5.2 mb)
Sup. Fig. 2–21Box plots of GC contents of each exon in the subset of genes grouped based on the number of exons. The genes with same number of exons were grouped in one group and box plot was drawn for each subset individually. The first plot for each species was drawn on the admixture of all the genes within the species. Within each set genes were further divided into GC rich (red) and GC poor (blue). Red boxes are missing in some plots because the GC rich genes with that exon number are not found. The exon index is presented on X-axis and the GC content is presented on Y-axis. Sup. Fig. 221 represent plant species in following order: P. trichocarpa; A. thaliana; C. papaya; V. vinifera; N. nucifera; S. polyrhiza; P. equestris; P. dactylifera; M. acuminata; A. comosus; S. bicolor; Z. mays; S. italica; O. sativa; B. distachyon; A. trichopoda; P. abies; S. moellendorffii; P. patens; V. carteri. (PDF 5278 kb)
12042_2016_9165_MOESM4_ESM.pdf (4.9 mb)
(PDF 5053 kb)
12042_2016_9165_MOESM5_ESM.pdf (4.9 mb)
(PDF 4983 kb)
12042_2016_9165_MOESM6_ESM.pdf (5.1 mb)
(PDF 5215 kb)
12042_2016_9165_MOESM7_ESM.pdf (5.3 mb)
(PDF 5382 kb)
12042_2016_9165_MOESM8_ESM.pdf (5.2 mb)
(PDF 5357 kb)
12042_2016_9165_MOESM9_ESM.pdf (5 mb)
(PDF 5075 kb)
12042_2016_9165_MOESM10_ESM.pdf (5.2 mb)
(PDF 5328 kb)
12042_2016_9165_MOESM11_ESM.pdf (5.3 mb)
(PDF 5465 kb)
12042_2016_9165_MOESM12_ESM.pdf (5.4 mb)
(PDF 5486 kb)
12042_2016_9165_MOESM13_ESM.pdf (5.3 mb)
(PDF 5422 kb)
12042_2016_9165_MOESM14_ESM.pdf (5.5 mb)
(PDF 5602 kb)
12042_2016_9165_MOESM15_ESM.pdf (5.3 mb)
(PDF 5465 kb)
12042_2016_9165_MOESM16_ESM.pdf (5.3 mb)
(PDF 5476 kb)
12042_2016_9165_MOESM17_ESM.pdf (5.3 mb)
(PDF 5453 kb)
12042_2016_9165_MOESM18_ESM.pdf (4.9 mb)
(PDF 4975 kb)
12042_2016_9165_MOESM19_ESM.pdf (4.8 mb)
(PDF 4897 kb)
12042_2016_9165_MOESM21_ESM.pdf (5.2 mb)
(PDF 5278 kb)
12042_2016_9165_MOESM22_ESM.pdf (5.3 mb)
(PDF 5432 kb)
12042_2016_9165_MOESM23_ESM.pdf (3.3 mb)
Sup. Fig. 22Matrix plot of correlations of GC contents between indexed intron and exon pairs. The exon index is presented on x-axis and intron index is on y-axis. Each circle in the plot represents the correlation of GC content between the intron and the exon at the assigned index. The size of each circle in the matrix plot corresponds to the magnitude of correlation and colors represent the direction of correlation. Green (r < 0.4) and red (r ≥ 0.4) colors indicate positive correlation while yellow(r < −0.4) and purple (r ≥ −0.4) represent negative correlation. (PDF 3342 kb) (PDF 3342 kb)
12042_2016_9165_MOESM24_ESM.pdf (2.4 mb)
Sup. Fig. 23Matrix plot of correlations of GC contents between indexed intron and exon pairs in a subset of genes with 15 exons. The exon index is presented on x-axis and intron index is on y-axis. Each circle in the plot represents the correlation of GC content between the intron and the exon at the assigned index. The size of each circle in the matrix plot corresponds to the magnitude of correlation and colors represent the direction of correlation. Green (r < 0.4) and red (r ≥ 0.4) colors indicate positive correlation while yellow(r < −0.4) and purple (r ≥ −0.4) represent negative correlation. (PDF 2436 kb) (PDF 2436 kb)
12042_2016_9165_MOESM25_ESM.pdf (5.7 mb)
Sup. Fig. 24Scatterplots of intron GC content on y-axis and exon GC content on x-axis for all the 20 selected genomes. The genes >5000 nt were represented in shades of red and smaller genes in shades of blue. The density of the colors corresponds to the number of genes plotted in the area. Pearson’s correlation coefficients (r) between the GC contents for large and small genes can be found below each window. (PDF 5832 kb) (PDF 5832 kb)
12042_2016_9165_MOESM26_ESM.pdf (5.6 mb)
Sup. Fig. 25Scatterplot of cumulative length of introns in a gene on y-axis and average GC content of exons in the corresponding gene on x-axis. The genes containing 10 or more introns were represented in shades of red and genes with introns less than 10 in shades of blue. The density of the colors corresponds to the number of genes plotted in the area. Pearson’s correlation coefficients (r) between the intron length and exon GC content can be found below each window. (PDF 5748 kb) (PDF 5748 kb)

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.Texas A&M AgriLife Research Center at DallasTexas A&M University SystemDallasUSA
  2. 2.Department of Plant BiologyUniversity of Illinois at Urbana-ChampaignUrbanaUSA
  3. 3.Department of Plant Pathology & MicrobiologyTexas A&M UniversityCollege StationUSA

Personalised recommendations