Abstract
We discuss the importance of non-reversible evolutionary models when analyzing context-dependence. Given the inherent non-reversible nature of the well-known CpG-methylation-deamination process in mammalian evolution, non-reversible context-dependent evolutionary models may be well able to accurately model such a process. In particular, the lack of constraints on non-reversible substitution models might allow for more accurate estimation of context-dependent substitution parameters. To demonstrate this, we have developed different time-homogeneous context-dependent evolutionary models to analyze a large genomic dataset of primate ancestral repeats based on existing independent evolutionary models. We have calculated the difference in model fit for each of these models using Bayes Factors obtained via thermodynamic integration. We find that non-reversible context-dependent models can drastically increase model fit when compared to independent models and this on two primate non-coding datasets. Further, we show that further improvements are possible by clustering similar parameters across contexts.
Similar content being viewed by others
Abbreviations
- GTR:
-
General time-reversible
- BF:
-
Bayes Factor
- QE:
-
Quasistatic estimate
References
Baele G, Van de Peer Y, Vansteelandt S (2008) A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. Syst Biol 57(5):675–692
Baele G, Van de Peer Y, Vansteelandt S (2009) Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences. BMC Evol Biol 9:87
Baele G, Van de Peer Y, Vansteelandt S (2010) Modelling the ancestral sequence distribution and equilibrium frequencies in context-dependent models for primate non-coding sequences. BMC Evol Biol (submitted, under review)
Balakirev ES, Ayala FJ (2003) Pseudogenes: are they “junk” or functional DNA? Annu Rev Genet 37:123–151
Blaisdell BE (1985) A method for estimating from two aligned present day DNA sequences their ancestral composition and subsequent rates of composition and subsequent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site. J Mol Evol 22:69–81
Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W (2004) Aligning multiple genomic sequences with the Threaded Blockset Aligner. Genome Res 14:708–715
Fryxell KJ, Zuckerkandl E (2000) Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol 17(9):1371–1383
Gojobori T, Ishii K, Nei M (1982) Estimation of the average number of nucleotide substitutions when the rate of substitution varies with nucleotide. J Mol Evol 18:414–423
Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732
Green P, Ewing B, Miller W, Thomas PJ, NISC Comparative Sequencing Program, Green ED (2003) Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet 33:514–517
Hasegawa M, Kishino H, Yano T (1985) Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174
Holmquist R (1976) Solution to a gene divergence problem under arbitrary stable nucleotide transition probabilities. J Mol Evol 8:337–349
Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51(1):32–43
Hwang DG, Green P (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA 101:13994–14001
Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro MN (ed) Mammalian protein metabolism, vol III. Academic Press, New York, pp 21–132
Jurka J (2000) RepBase Update: a database and an electronic journal of repetitive elements. Trends Genet 9:418–420
Kass RE, Raftery AE (1995) Bayes Factors. J Am Stat Assoc 90(430):773–795
Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120
Kimura M (1981) Estimation of evolutionary differences between homologous nucleotide sequences. Proc Natl Acad Sci USA 78:454–458
Lanavé C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93
Lartillot N, Philippe H (2006) Computing Bayes Factors using thermodynamic integration. Syst Biol 55(2):195–207
Margulies EH, Blanchette M, NISC Comparative Sequencing Program, Haussler D, Green ED (2003) Identification and characterization of multi-species conserved sequences. Genome Res 13:2507–2518
Margulies EH, Chen CW, Green ED (2006) Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons. Trends Genet 22(4):187–193
Mighell AJ, Smith NR, Robinson PA, Markham AF (2000) Vertebrate pseudogenes. FEBS Lett 468:109–114
Miyamoto MM, Slighton JL, Goodman M (1987) Phylogenetic relations of humans and African apes from DNA sequences in the ψη-globin region. Science 238:369–373
Morton BR, Oberholzer VM, Clegg MT (1997) The influence of specific neighboring bases on substitution bias in noncoding regions of the plant chloroplast genome. J Mol Evol 45:227–231
Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 53(4):571–581
Rodrigue N, Philippe H, Lartillot N (2006) Assessing site-interdependent phylogenetic models of sequence evolution. Mol Biol Evol 23(9):1762–1775
Ronquist F, Deans AR (2010) Bayesian phylogenetics and its influence on insect systematic. Annu Rev Entomol 55:189–206
Schadt EE, Sinsheimer JS, Lange K (1998) Computational advances in maximum likelihood methods for molecular phylogeny. Genome Res 8:222–233
Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall/CRC, London
Siepel A, Haussler D (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21(3):468–488
Smit AFA, Hubley R, Green P (1996–2004) RepeatMasker Open-3.0. http://www.repeatmasker.org
Steel M (2005) Should phylogenetic models be trying to ‘fit an elephant’? Trends Genet 21(6):307–309
Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol 1(3):269–285
Takahata N, Kimura M (1981) A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. Genetics 98:641–657
Tamura K, Nei M (1984) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10(3):512–526
Yang Z (1994) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111
Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11(9):367–372
Zwickl D, Holder M (2004) Model parameterization, prior distributions, and the general time-reversible model in Bayesian phylogenetics. Syst Biol 53(6):877–888
Acknowledgments
Computational resources and services used in this study were provided by Ghent University. We would like to thank Stijn De Weirdt in particular for his assistance in using Ghent University’s High Performance Computing facility. We would also like to thank Hervé Philippe and one anonymous referee for providing useful comments on a first version. Yves Van de Peer acknowledges support from an Interuniversity Attraction Pole (IUAP) grant for the BioMaGNet project (Bioinformatics and Modelling: from Genomes to Networks, ref. p6/25). Stijn Vansteelandt acknowledges support from IAP research network Grant No. P06/03 from the Belgian government (Belgian Science Policy). We acknowledge the support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Baele, G., Van de Peer, Y. & Vansteelandt, S. Using Non-Reversible Context-Dependent Evolutionary Models to Study Substitution Patterns in Primate Non-Coding Sequences. J Mol Evol 71, 34–50 (2010). https://doi.org/10.1007/s00239-010-9362-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00239-010-9362-y