Skip to main content
Log in

Using Non-Reversible Context-Dependent Evolutionary Models to Study Substitution Patterns in Primate Non-Coding Sequences

  • Published:
Journal of Molecular Evolution Aims and scope Submit manuscript

Abstract

We discuss the importance of non-reversible evolutionary models when analyzing context-dependence. Given the inherent non-reversible nature of the well-known CpG-methylation-deamination process in mammalian evolution, non-reversible context-dependent evolutionary models may be well able to accurately model such a process. In particular, the lack of constraints on non-reversible substitution models might allow for more accurate estimation of context-dependent substitution parameters. To demonstrate this, we have developed different time-homogeneous context-dependent evolutionary models to analyze a large genomic dataset of primate ancestral repeats based on existing independent evolutionary models. We have calculated the difference in model fit for each of these models using Bayes Factors obtained via thermodynamic integration. We find that non-reversible context-dependent models can drastically increase model fit when compared to independent models and this on two primate non-coding datasets. Further, we show that further improvements are possible by clustering similar parameters across contexts.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Abbreviations

GTR:

General time-reversible

BF:

Bayes Factor

QE:

Quasistatic estimate

References

  • Baele G, Van de Peer Y, Vansteelandt S (2008) A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. Syst Biol 57(5):675–692

    Article  CAS  PubMed  Google Scholar 

  • Baele G, Van de Peer Y, Vansteelandt S (2009) Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences. BMC Evol Biol 9:87

    Article  PubMed  Google Scholar 

  • Baele G, Van de Peer Y, Vansteelandt S (2010) Modelling the ancestral sequence distribution and equilibrium frequencies in context-dependent models for primate non-coding sequences. BMC Evol Biol (submitted, under review)

  • Balakirev ES, Ayala FJ (2003) Pseudogenes: are they “junk” or functional DNA? Annu Rev Genet 37:123–151

    Article  CAS  PubMed  Google Scholar 

  • Blaisdell BE (1985) A method for estimating from two aligned present day DNA sequences their ancestral composition and subsequent rates of composition and subsequent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site. J Mol Evol 22:69–81

    Article  CAS  PubMed  Google Scholar 

  • Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W (2004) Aligning multiple genomic sequences with the Threaded Blockset Aligner. Genome Res 14:708–715

    Article  CAS  PubMed  Google Scholar 

  • Fryxell KJ, Zuckerkandl E (2000) Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol 17(9):1371–1383

    CAS  PubMed  Google Scholar 

  • Gojobori T, Ishii K, Nei M (1982) Estimation of the average number of nucleotide substitutions when the rate of substitution varies with nucleotide. J Mol Evol 18:414–423

    Article  CAS  PubMed  Google Scholar 

  • Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732

    Article  Google Scholar 

  • Green P, Ewing B, Miller W, Thomas PJ, NISC Comparative Sequencing Program, Green ED (2003) Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet 33:514–517

    Article  CAS  PubMed  Google Scholar 

  • Hasegawa M, Kishino H, Yano T (1985) Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174

    Article  CAS  PubMed  Google Scholar 

  • Holmquist R (1976) Solution to a gene divergence problem under arbitrary stable nucleotide transition probabilities. J Mol Evol 8:337–349

    Article  CAS  PubMed  Google Scholar 

  • Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51(1):32–43

    Article  PubMed  Google Scholar 

  • Hwang DG, Green P (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA 101:13994–14001

    Article  CAS  PubMed  Google Scholar 

  • Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro MN (ed) Mammalian protein metabolism, vol III. Academic Press, New York, pp 21–132

    Google Scholar 

  • Jurka J (2000) RepBase Update: a database and an electronic journal of repetitive elements. Trends Genet 9:418–420

    Article  Google Scholar 

  • Kass RE, Raftery AE (1995) Bayes Factors. J Am Stat Assoc 90(430):773–795

    Article  Google Scholar 

  • Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120

    Article  CAS  PubMed  Google Scholar 

  • Kimura M (1981) Estimation of evolutionary differences between homologous nucleotide sequences. Proc Natl Acad Sci USA 78:454–458

    Article  CAS  PubMed  Google Scholar 

  • Lanavé C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93

    Article  PubMed  Google Scholar 

  • Lartillot N, Philippe H (2006) Computing Bayes Factors using thermodynamic integration. Syst Biol 55(2):195–207

    Article  PubMed  Google Scholar 

  • Margulies EH, Blanchette M, NISC Comparative Sequencing Program, Haussler D, Green ED (2003) Identification and characterization of multi-species conserved sequences. Genome Res 13:2507–2518

    Article  CAS  PubMed  Google Scholar 

  • Margulies EH, Chen CW, Green ED (2006) Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons. Trends Genet 22(4):187–193

    Article  CAS  PubMed  Google Scholar 

  • Mighell AJ, Smith NR, Robinson PA, Markham AF (2000) Vertebrate pseudogenes. FEBS Lett 468:109–114

    Article  CAS  PubMed  Google Scholar 

  • Miyamoto MM, Slighton JL, Goodman M (1987) Phylogenetic relations of humans and African apes from DNA sequences in the ψη-globin region. Science 238:369–373

    Article  CAS  PubMed  Google Scholar 

  • Morton BR, Oberholzer VM, Clegg MT (1997) The influence of specific neighboring bases on substitution bias in noncoding regions of the plant chloroplast genome. J Mol Evol 45:227–231

    Article  CAS  PubMed  Google Scholar 

  • Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 53(4):571–581

    Article  PubMed  Google Scholar 

  • Rodrigue N, Philippe H, Lartillot N (2006) Assessing site-interdependent phylogenetic models of sequence evolution. Mol Biol Evol 23(9):1762–1775

    Article  CAS  PubMed  Google Scholar 

  • Ronquist F, Deans AR (2010) Bayesian phylogenetics and its influence on insect systematic. Annu Rev Entomol 55:189–206

    Article  CAS  PubMed  Google Scholar 

  • Schadt EE, Sinsheimer JS, Lange K (1998) Computational advances in maximum likelihood methods for molecular phylogeny. Genome Res 8:222–233

    CAS  PubMed  Google Scholar 

  • Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall/CRC, London

    Book  Google Scholar 

  • Siepel A, Haussler D (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21(3):468–488

    Article  CAS  PubMed  Google Scholar 

  • Smit AFA, Hubley R, Green P (1996–2004) RepeatMasker Open-3.0. http://www.repeatmasker.org

  • Steel M (2005) Should phylogenetic models be trying to ‘fit an elephant’? Trends Genet 21(6):307–309

    Article  CAS  PubMed  Google Scholar 

  • Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol 1(3):269–285

    CAS  PubMed  Google Scholar 

  • Takahata N, Kimura M (1981) A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. Genetics 98:641–657

    CAS  PubMed  Google Scholar 

  • Tamura K, Nei M (1984) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10(3):512–526

    Google Scholar 

  • Yang Z (1994) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111

    PubMed  Google Scholar 

  • Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11(9):367–372

    Article  Google Scholar 

  • Zwickl D, Holder M (2004) Model parameterization, prior distributions, and the general time-reversible model in Bayesian phylogenetics. Syst Biol 53(6):877–888

    Article  PubMed  Google Scholar 

Download references

Acknowledgments

Computational resources and services used in this study were provided by Ghent University. We would like to thank Stijn De Weirdt in particular for his assistance in using Ghent University’s High Performance Computing facility. We would also like to thank Hervé Philippe and one anonymous referee for providing useful comments on a first version. Yves Van de Peer acknowledges support from an Interuniversity Attraction Pole (IUAP) grant for the BioMaGNet project (Bioinformatics and Modelling: from Genomes to Networks, ref. p6/25). Stijn Vansteelandt acknowledges support from IAP research network Grant No. P06/03 from the Belgian government (Belgian Science Policy). We acknowledge the support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yves Van de Peer.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baele, G., Van de Peer, Y. & Vansteelandt, S. Using Non-Reversible Context-Dependent Evolutionary Models to Study Substitution Patterns in Primate Non-Coding Sequences. J Mol Evol 71, 34–50 (2010). https://doi.org/10.1007/s00239-010-9362-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00239-010-9362-y

Keywords

Navigation