An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes

Lèbre, Sophie; Michel, Christian J.

doi:10.1007/s11538-012-9735-z

An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes

Original Article
Published: 30 May 2012

Volume 74, pages 1764–1788, (2012)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Sophie Lèbre¹ &
Christian J. Michel¹

227 Accesses
3 Citations
Explore all metrics

Abstract

We introduce here a gene evolution model which is an extension of the time-continuous stochastic IDIS model (Lèbre and Michel in J. Comput. Biol. Chem. 34:259–267, 2010) to sequence length. This new IDISL (Insertion Deletion Independent of Substitution based on sequence Length) model gives an analytical expression of the residue occurrence probability p(l) at sequence length l depending on stochastically independent processes of substitution, insertion, and deletion. Furthermore, in contrast to all mathematical models in this research field, the substitution, insertion, and deletion parameters of the IDISL model are independent of each other. For any diagonalizable substitution matrix M, the residue occurrence probability p(l) is given as a function of the eigenvalues of M, the eigenvector matrix of M, a vector r of the residue insertion rates, a deletion rate d (unlike our previous IDIS model), and a vector of the initial residue occurrence probability p(l ₀) at sequence length l ₀.

As another difference with the classical evolution approaches which mainly focus on sequence alignment, the IDIS class of models allows a mathematical analysis of the behavior of the residue occurrence probability according to either evolution time or sequence length. The length parameter can be associated with any nucleotide regions: genes, genomes, introns, repeats, 5′ and 3′ regions, etc. Three properties of the IDISL model are given in relation with the sequence length l: parameter scale, inverse evolution, and residue equilibrium distribution. Nucleotide occurrence probabilities are given in the particular case of the IDISL-HKY model, i.e. the IDISL model associated with the HKY asymmetric substitution matrix (Hasegawa et al. in J. Mol. Evol. 22:160–174, 1985).

An application of the IDISL model is developed for a massive statistical analysis of GC content in all complete bacterial genomes available to date (894 non-anaerobic and anaerobic genomes). The IDISL-HKY model confirms the increase of the GC content with the genome length for two non-anaerobic taxonomic groups of bacterial genomes. Moreover, the non-linear modelling proposed by the IDISL model outperforms the most recent modelling of GC content in these bacterial genomes (Wang et al. in Biochem. Biophys. Res. Commun. 342:681–684, 2006; Musto et al. in Biochem. Biophys. Res. Commun. 347:1–3, 2006).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Codon usage bias

Article 25 November 2021

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

Back to the fundamentals: a reply to Basener and Sanford 2018

Article 03 April 2024

References

Aldous, D., & Fill, J. A. (2002). Reversible Markov chains and random walks on graphs. Berkeley: University of California.
Google Scholar
Arquès, D. G., & Michel, C. J. (1993). Analytical expression of the purine/pyrimidine codon probability after and before random mutations. Bull. Math. Biol., 55, 1025–1038.
MATH Google Scholar
Arquès, D. G., & Michel, C. J. (1995). Analytical solutions of the dinucleotide probability after and before random mutations. J. Theor. Biol., 175, 533–544.
Article Google Scholar
Bastolla, U., Moya, A., Viguera, E., & van Ham, R. C. (2004). Genomic determinants of protein folding thermodynamics in prokaryotic organisms. J. Mol. Biol., 343, 1451–1466.
Article Google Scholar
Benard, E., & Michel, C. J. (2009). Computation of direct and inverse mutations with the SEGM web server (Stochastic Evolution of Genetic Motifs): an application to splice sites of human genome introns. Comput. Biol. Chem., 33, 245–252.
Article Google Scholar
Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman & Hall.
MATH Google Scholar
Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. J. Mol. Evol., 17, 368–376.
Article Google Scholar
Felsenstein, J., & Churchill, G. A. (1996). A hidden Markov model approach to variation among sites in rate of evolution. Mol. Biol. Evol., 13, 93–104.
Article Google Scholar
Foerstner, K. U., von Mering, C., Hooper, S. D., & Bork, P. (2005). Environments shape the nucleotide composition of genomes. EMBO Rep., 6, 1208–1213.
Article Google Scholar
Freese, E. (1962). On the evolution of base composition of DNA. J. Theor. Biol., 3, 82–101.
Article Google Scholar
Giraud, A., Matic, I., Tenaillon, O., Clara, A., Radman, M., Fons, M., & Taddei, F. (2001). Costs and benefits of high mutation rates: adaptive evolution of bacteria in the mouse gut. Science, 291, 2606–2608.
Article Google Scholar
Hasegawa, M., Kishino, H., & Yano, T. (1985). Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J. Mol. Evol., 22, 160–174.
Article Google Scholar
Jukes, T. H., & Cantor, C. R. (1969). Evolution of protein molecules. In H. N. Munro (Ed.), Mammalian protein metabolism (pp. 21–132). New York: Academic Press.
Google Scholar
Kelly, F. P. (1979). Reversibility and stochastic networks. Chichester: Wiley.
MATH Google Scholar
Kimura, M. (1980). A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111–120.
Article Google Scholar
Kimura, M. (1981). Estimation of evolutionary distances between homologous nucleotide sequences. Proc. Natl. Acad. Sci. USA, 78, 454–458.
Article MATH Google Scholar
Koonin, E. V., & Wolf, Y. I. (2008). Genomics of bacteria and archaea: the emerging dynamic view of the prokaryotic world. Nucleic Acids Res., 36, 6688–6719.
Article Google Scholar
Lèbre, S., & Michel, C. J. (2010). A stochastic evolution model for residue insertion–deletion independent from substitution. Comput. Biol. Chem., 34, 259–267.
Article MathSciNet Google Scholar
Lee, K. Y., Wahl, R., & Barbu, E. (1956). Contenu en bases puriques et pyrimidiques des acides désoxyribonucléiques des bactéries. Ann. Inst. Pasteur, 91, 212–224.
Google Scholar
Malthus, T. R. (2000). An essay on the principle of population. Library of Economics, Liberty, Fund, Inc.
McGuire, G., Denham, M. C., & Balding, D. J. (2001). Models of sequence evolution for DNA sequences containing gaps. Mol. Biol. Evol., 18, 481–490.
Article Google Scholar
Metzler, D. (2003). Statistical alignment based on fragment insertion and deletion models. Bioinformatics, 19, 490–499.
Article Google Scholar
Michel, C. J. (2007). An analytical model of gene evolution with 9 mutation parameters: an application to the amino acids coded by the common circular code. Bull. Math. Biol., 69, 677–698.
Article MATH Google Scholar
Miklós, I., Lunter, G. A., & Holmes, I. (2004). A “long indel” model for evolutionary sequence alignment. Mol. Biol. Evol., 21, 529–540.
Article Google Scholar
Miklós, I., Novák, A., Satija, R., Lyngsø, R., & Hein, J. (2009). Stochastic models of sequence evolution including insertion–deletion events. Stat. Methods Med. Res., 18, 453–485.
Article MathSciNet Google Scholar
Moran, N. A. (1962). Microbial minimalism: genome reduction in bacterial pathogens. Cell, 108, 583–586.
Article Google Scholar
Musto, H., Naya, H., Zavala, A., Romero, H., Alvarez-Valín, F., & Bernardi, G. (2006). Genomic GC level, optimal growth temperature, and genome size in prokaryotes. Biochem. Biophys. Res. Commun., 347, 1–3.
Article Google Scholar
Rivas, E. (2005). Evolutionary models for insertions and deletions in a probabilistic modeling framework. BMC Bioinform., 6, 63.
Article Google Scholar
Rivas, E., & Eddy, S. R. (2008). Probabilistic phylogenetic inference with insertions and deletions. PLoS Comput. Biol., 4(9), e1000172.
Article MathSciNet Google Scholar
Rocha, E. P., & Danchin, A. (2002). Base composition bias might result from competition for metabolic resources. Trends Genet., 18, 291–294.
Article Google Scholar
Satapathy, S. S., Dutta, M., & Ray, S. K. (2010). Variable correlation of genome GC% with transfer RNA number as well as with transfer RNA diversity among bacterial groups: a-Proteobacteria and Tenericutes exhibit strong positive correlation. Microbiol. Res., 165, 232–242.
Article Google Scholar
Sueoka, N. (1962). On the genetic basis of variation and heterogeneity of DNA base composition. Proc. Natl. Acad. Sci. USA, 48, 582–592.
Article Google Scholar
Tavaré, S. (1986). Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci., 17, 57–86.
Google Scholar
Takahata, N., & Kimura, M. (1981). A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. Genetics, 98, 641–657.
MathSciNet Google Scholar
Tamura, K., & Nei, M. (1993). Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol. Biol. Evol., 10, 512–526.
Google Scholar
Thorne, J. L., Kishino, H., & Felsenstein, J. (1991). An evolutionary model for maximum likelihood alignment of DNA sequences. J. Mol. Evol., 33, 114–124.
Article Google Scholar
Thorne, J. L., Kishino, H., & Felsenstein, J. (1992). Inching toward reality: an improved likelihood model of sequence evolution. J. Mol. Evol., 34, 3–16.
Article Google Scholar
Wang, H. C., Susko, E., & Roger, A. J. (2006). On the correlation between genomic G+C content and optimal growth temperature in prokaryotes: data quality and confounding factors. Biochem. Biophys. Res. Commun., 342, 681–684.
Article Google Scholar
Yang, Z. (1994). Estimating the pattern of nucleotide substitution. J. Mol. Evol., 39, 105–111.
Google Scholar

Download references

Acknowledgement

We thank the three reviewers for their advice.

Author information

Authors and Affiliations

Equipe de Bioinformatique Théorique, BFO, LSIIT (UMR 7005), Université de Strasbourg, Pôle API, Boulevard Sébastien Brant, 67400, Illkirch, France
Sophie Lèbre & Christian J. Michel

Authors

Sophie Lèbre
View author publications
You can also search for this author in PubMed Google Scholar
Christian J. Michel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian J. Michel.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Lèbre, S., Michel, C.J. An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes. Bull Math Biol 74, 1764–1788 (2012). https://doi.org/10.1007/s11538-012-9735-z

Download citation

Received: 03 July 2011
Accepted: 01 May 2012
Published: 30 May 2012
Issue Date: August 2012
DOI: https://doi.org/10.1007/s11538-012-9735-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes

Abstract

Access this article

Similar content being viewed by others

Codon usage bias

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

Back to the fundamentals: a reply to Basener and Sanford 2018

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Evolution Model for Sequence Length Based on Residue Insertion–Deletion Independent of Substitution: An Application to the GC Content in Bacterial Genomes

Abstract

Access this article

Similar content being viewed by others

Codon usage bias

A Beginners Guide to Estimating the Non-synonymous to Synonymous Rate Ratio of all Protein-Coding Genes in a Genome

Back to the fundamentals: a reply to Basener and Sanford 2018

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation