Skip to main content
Log in

An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

We introduce here a gene evolution model which is an extension of the time-continuous stochastic IDIS model (Lèbre and Michel in J. Comput. Biol. Chem. 34:259–267, 2010) to sequence length. This new IDISL (Insertion Deletion Independent of Substitution based on sequence Length) model gives an analytical expression of the residue occurrence probability p(l) at sequence length l depending on stochastically independent processes of substitution, insertion, and deletion. Furthermore, in contrast to all mathematical models in this research field, the substitution, insertion, and deletion parameters of the IDISL model are independent of each other. For any diagonalizable substitution matrix M, the residue occurrence probability p(l) is given as a function of the eigenvalues of M, the eigenvector matrix of M, a vector r of the residue insertion rates, a deletion rate d (unlike our previous IDIS model), and a vector of the initial residue occurrence probability p(l 0) at sequence length l 0.

As another difference with the classical evolution approaches which mainly focus on sequence alignment, the IDIS class of models allows a mathematical analysis of the behavior of the residue occurrence probability according to either evolution time or sequence length. The length parameter can be associated with any nucleotide regions: genes, genomes, introns, repeats, 5′ and 3′ regions, etc. Three properties of the IDISL model are given in relation with the sequence length l: parameter scale, inverse evolution, and residue equilibrium distribution. Nucleotide occurrence probabilities are given in the particular case of the IDISL-HKY model, i.e. the IDISL model associated with the HKY asymmetric substitution matrix (Hasegawa et al. in J. Mol. Evol. 22:160–174, 1985).

An application of the IDISL model is developed for a massive statistical analysis of GC content in all complete bacterial genomes available to date (894 non-anaerobic and anaerobic genomes). The IDISL-HKY model confirms the increase of the GC content with the genome length for two non-anaerobic taxonomic groups of bacterial genomes. Moreover, the non-linear modelling proposed by the IDISL model outperforms the most recent modelling of GC content in these bacterial genomes (Wang et al. in Biochem. Biophys. Res. Commun. 342:681–684, 2006; Musto et al. in Biochem. Biophys. Res. Commun. 347:1–3, 2006).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Aldous, D., & Fill, J. A. (2002). Reversible Markov chains and random walks on graphs. Berkeley: University of California.

    Google Scholar 

  • Arquès, D. G., & Michel, C. J. (1993). Analytical expression of the purine/pyrimidine codon probability after and before random mutations. Bull. Math. Biol., 55, 1025–1038.

    MATH  Google Scholar 

  • Arquès, D. G., & Michel, C. J. (1995). Analytical solutions of the dinucleotide probability after and before random mutations. J. Theor. Biol., 175, 533–544.

    Article  Google Scholar 

  • Bastolla, U., Moya, A., Viguera, E., & van Ham, R. C. (2004). Genomic determinants of protein folding thermodynamics in prokaryotic organisms. J. Mol. Biol., 343, 1451–1466.

    Article  Google Scholar 

  • Benard, E., & Michel, C. J. (2009). Computation of direct and inverse mutations with the SEGM web server (Stochastic Evolution of Genetic Motifs): an application to splice sites of human genome introns. Comput. Biol. Chem., 33, 245–252.

    Article  Google Scholar 

  • Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman & Hall.

    MATH  Google Scholar 

  • Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17, 368–376.

    Article  Google Scholar 

  • Felsenstein, J., & Churchill, G. A. (1996). A hidden Markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol., 13, 93–104.

    Article  Google Scholar 

  • Foerstner, K. U., von Mering, C., Hooper, S. D., & Bork, P. (2005). Environments shape the nucleotide composition of genomes. EMBO Rep., 6, 1208–1213.

    Article  Google Scholar 

  • Freese, E. (1962). On the evolution of base composition of DNA. J. Theor. Biol., 3, 82–101.

    Article  Google Scholar 

  • Giraud, A., Matic, I., Tenaillon, O., Clara, A., Radman, M., Fons, M., & Taddei, F. (2001). Costs and benefits of high mutation rates: adaptive evolution of bacteria in the mouse gut. Science, 291, 2606–2608.

    Article  Google Scholar 

  • Hasegawa, M., Kishino, H., & Yano, T. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 22, 160–174.

    Article  Google Scholar 

  • Jukes, T. H., & Cantor, C. R. (1969). Evolution of protein molecules. In H. N. Munro (Ed.), Mammalian protein metabolism (pp. 21–132). New York: Academic Press.

    Google Scholar 

  • Kelly, F. P. (1979). Reversibility and stochastic networks. Chichester: Wiley.

    MATH  Google Scholar 

  • Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111–120.

    Article  Google Scholar 

  • Kimura, M. (1981). Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. USA, 78, 454–458.

    Article  MATH  Google Scholar 

  • Koonin, E. V., & Wolf, Y. I. (2008). Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res., 36, 6688–6719.

    Article  Google Scholar 

  • Lèbre, S., & Michel, C. J. (2010). A stochastic evolution model for residue insertion–deletion independent from substitution. Comput. Biol. Chem., 34, 259–267.

    Article  MathSciNet  Google Scholar 

  • Lee, K. Y., Wahl, R., & Barbu, E. (1956). Contenu en bases puriques et pyrimidiques des acides désoxyribonucléiques des bactéries. Ann. Inst. Pasteur, 91, 212–224.

    Google Scholar 

  • Malthus, T. R. (2000). An essay on the principle of population. Library of Economics, Liberty, Fund, Inc.

  • McGuire, G., Denham, M. C., & Balding, D. J. (2001). Models of sequence evolution for DNA sequences containing gaps. Mol. Biol. Evol., 18, 481–490.

    Article  Google Scholar 

  • Metzler, D. (2003). Statistical alignment based on fragment insertion and deletion models. Bioinformatics, 19, 490–499.

    Article  Google Scholar 

  • Michel, C. J. (2007). An analytical model of gene evolution with 9 mutation parameters: an application to the amino acids coded by the common circular code. Bull. Math. Biol., 69, 677–698.

    Article  MATH  Google Scholar 

  • Miklós, I., Lunter, G. A., & Holmes, I. (2004). A “long indel” model for evolutionary sequence alignment. Mol. Biol. Evol., 21, 529–540.

    Article  Google Scholar 

  • Miklós, I., Novák, A., Satija, R., Lyngsø, R., & Hein, J. (2009). Stochastic models of sequence evolution including insertion–deletion events. Stat. Methods Med. Res., 18, 453–485.

    Article  MathSciNet  Google Scholar 

  • Moran, N. A. (1962). Microbial minimalism: genome reduction in bacterial pathogens. Cell, 108, 583–586.

    Article  Google Scholar 

  • Musto, H., Naya, H., Zavala, A., Romero, H., Alvarez-Valín, F., & Bernardi, G. (2006). Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem. Biophys. Res. Commun., 347, 1–3.

    Article  Google Scholar 

  • Rivas, E. (2005). Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC Bioinform., 6, 63.

    Article  Google Scholar 

  • Rivas, E., & Eddy, S. R. (2008). Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput. Biol., 4(9), e1000172.

    Article  MathSciNet  Google Scholar 

  • Rocha, E. P., & Danchin, A. (2002). Base composition bias might result from competition for metabolic resources. Trends Genet., 18, 291–294.

    Article  Google Scholar 

  • Satapathy, S. S., Dutta, M., & Ray, S. K. (2010). Variable correlation of genome GC% with transfer RNA number as well as with transfer RNA diversity among bacterial groups: a-Proteobacteria and Tenericutes exhibit strong positive correlation. Microbiol. Res., 165, 232–242.

    Article  Google Scholar 

  • Sueoka, N. (1962). On the genetic basis of variation and heterogeneity of DNA base composition. Proc. Natl. Acad. Sci. USA, 48, 582–592.

    Article  Google Scholar 

  • Tavaré, S. (1986). Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci., 17, 57–86.

    Google Scholar 

  • Takahata, N., & Kimura, M. (1981). A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. Genetics, 98, 641–657.

    MathSciNet  Google Scholar 

  • Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol., 10, 512–526.

    Google Scholar 

  • Thorne, J. L., Kishino, H., & Felsenstein, J. (1991). An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol., 33, 114–124.

    Article  Google Scholar 

  • Thorne, J. L., Kishino, H., & Felsenstein, J. (1992). Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol., 34, 3–16.

    Article  Google Scholar 

  • Wang, H. C., Susko, E., & Roger, A. J. (2006). On the correlation between genomic G+C content and optimal growth temperature in prokaryotes: data quality and confounding factors. Biochem. Biophys. Res. Commun., 342, 681–684.

    Article  Google Scholar 

  • Yang, Z. (1994). Estimating the pattern of nucleotide substitution. J. Mol. Evol., 39, 105–111.

    Google Scholar 

Download references

Acknowledgement

We thank the three reviewers for their advice.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian J. Michel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lèbre, S., Michel, C.J. An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes. Bull Math Biol 74, 1764–1788 (2012). https://doi.org/10.1007/s11538-012-9735-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-012-9735-z

Keywords

Navigation