Skip to main content

Advertisement

Log in

The Mystery of Two Straight Lines in Bacterial Genome Statistics

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

In special coordinates (codon position-specific nucleotide frequencies), bacterial genomes form two straight lines in 9-dimensional space: one line for eubacterial genomes, another for archaeal genomes. All the 348 distinct bacterial genomes available in Genbank in April 2007, belong to these lines with high accuracy. The main challenge now is to explain the observed high accuracy. The new phenomenon of complementary symmetry for codon position-specific nucleotide frequencies is observed. The results of analysis of several codon usage models are presented. We demonstrate that the mean-field approximation, which is also known as context-free, or complete independence model, or Segre variety, can serve as a reasonable approximation to the real codon usage. The first two principal components of codon usage correlate strongly with genomic G+C content and the optimal growth temperature, respectively. The variation of codon usage along the third component is related to the curvature of the mean-field approximation. First three eigenvalues in codon usage PCA explain 59.1%, 7.8% and 4.7% of variation. The eubacterial and archaeal genomes codon usage is clearly distributed along two third order curves with genomic G+C content as a parameter.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  • Besemer, J., Borodovsky, M., 1999. Heuristic approach to deriving models for gene finding. Nucleic Acids Res. 27(19), 3911–920.

    Article  Google Scholar 

  • Bharanidharan, D., Bhargavi, G.R., Uthanumallian, K., Gautham, N., 2004. Correlations between nucleotide frequencies and amino acid composition in 115 bacterial species. Biochem. Biophys. Res. Commun. 315, 1097–103.

    Article  Google Scholar 

  • Cangelosi, R., Goriely, A., 2007. Component retention in principal component analysis with application to cDNA microarray data. Biol. Direct 2, 2, doi:10.1186/1745-6150-2-2

    Article  Google Scholar 

  • Carbone, A., Kepes, F., Zinovyev, A., 2005. Codon bias signatures, organisation of microorganisms in codon space and lifestyle. Mol. Biol. Evol. 22, 547–61.

    Article  Google Scholar 

  • Carlon, E., Malki, M.L., Blossey, R., 2005. Exons, introns, and DNA thermodynamics. Phys. Rev. Lett. 94, 178101.

    Article  Google Scholar 

  • Chen, S.L., Lee, W., Hottes, A.K., Shapiro, L., McAdams, H.H., 2004. Codon usage between genomes is constrained by genome-wide mutational processes. PNAS 101(10), 3480–485.

    Article  Google Scholar 

  • Cluster structures in genomic word frequency distributions, 2004. Web-site: http://www.ihes.fr//~zinovyev/7clusters

  • Frappat, L., Sciarrino, A., 2006. Conspiracy in bacterial genomes. Physica A 369, 699–13.

    Article  Google Scholar 

  • Gorban, A.N., Zinovyev, A.Y., 2004. The mystery of two straight lines in bacterial genome statistics. arXiv q-bio.GN/0412015

  • Gorban, A.N., Zinovyev, A.Y., Popova, T.G., 2005a. Four basic symmetry types in the universal 7-cluster structure of 143 complete bacterial genomic sequences. In Silico Biol. 5, 0025. On-line: http://www.bioinfo.de/isb/2005/05/0025/

    Google Scholar 

  • Gorban, A., Popova, T., Zinovyev, A., 2005b. Codon usage trajectories and 7-cluster structure of 143 complete bacterial genomic sequences. Physica A 353, 365–87.

    Article  Google Scholar 

  • Knight, R.D., Freeland, S.J., Landweber, L.F., 2001. A simple model based on mutation and selection explains trends in codon and amino-acid usage and GC composition within and across genomes. Genome Biol. 2, 0010.1–010.13

    Google Scholar 

  • Lobry, J., 1997. Influence of genomic G+C content on average amino-acid composition of proteins from 59 bacterial species. Gene 205(1–2), 309–16.

    Article  Google Scholar 

  • Lobry, J.R., Sueoka, N., 2002. Asymmetric directional mutation pressures in bacteria. Genome Biol. 3(10), 0058.

    Article  Google Scholar 

  • Lobry, J.R., Chessel, D., 2003. Internal correspondence analysis of codon and amino-acid usage in thermophilic bacteria. J. Appl. Genet. 44(2), 235–61.

    Google Scholar 

  • Lynn, D.J., Gregory, A.C., Singer, G.A.C., Hickey, D.A., 2002. Synonymous codon usage is subject to selection in thermophilic bacteria. Nucleic Acids Res. 30(19), 4272–277.

    Article  Google Scholar 

  • Minichini, C., Sciarrino, A., 2006. Mutation model for nucleotide sequences based on crystal basis. Biosystems 84, 191–06, arXiv q-bio.BM/0506010

    Article  Google Scholar 

  • Muto, A., Osawa, S., 1987. The guanine and cytosine content of genomic DNA and bacterial evolution. Proc. Natl. Acad. Sci. USA 84, 166–69.

    Article  Google Scholar 

  • Pachter, L., Sturmfels, B. (Eds.), 2005. Algebraic Statistics for Computational Biology. Cambridge University Press, Cambridge.

    MATH  Google Scholar 

  • Pachter, L., Sturmfels, B., 2007. The mathematics of phylogenomics. SIAM Rev. 49(1), 3–1.

    Article  MATH  MathSciNet  Google Scholar 

  • Singer, G.A.C., Hickey, D.A., 2000. Nucleotide bias causes a genomewide bias in the amino acid composition of proteins. Mol. Biol. Evol. 17, 1581–588.

    Google Scholar 

  • Sueoka, N., 1962. On the genetic basis of variation and heterogeneity of DNA base composition. Proc. Natl. Acad. Sci. USA 48, 582–92.

    Article  Google Scholar 

  • Sueoka, N., 1988. Directional mutation pressure and neutral molecular evolution. Proc. Natl. Acad. Sci. USA 85(8), 2653–657.

    Article  Google Scholar 

  • Wan, X.F., Xu, D., Kleinhofs, A., Zhou, J., 2004. Quantitative relationship between synonymous codon usage bias and GC composition across unicellular genomes. BMC Evol. Biol. 4(1), 19.

    Article  Google Scholar 

  • Yeramian, E., 2000a. Genes and the physics of the DNA double-helix. Gene 255, 139–50.

    Article  Google Scholar 

  • Yeramian, E., 2000b. The physics of DNA and the annotation of the Plasmodium falsiparum genome. Gene 255, 151–68.

    Article  Google Scholar 

  • Zinovyev, A.Y., Gorban, A.N., Popova, T.G., 2003. Self-organizing approach for automated gene identification. Open Syst. Inf. Dyn. 10, 321–33.

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to A. N. Gorban.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gorban, A.N., Zinovyev, A.Y. The Mystery of Two Straight Lines in Bacterial Genome Statistics. Bull. Math. Biol. 69, 2429–2442 (2007). https://doi.org/10.1007/s11538-007-9229-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-007-9229-6

Keywords

Navigation