Journal of Molecular Evolution

, Volume 71, Issue 1, pp 34–50 | Cite as

Using Non-Reversible Context-Dependent Evolutionary Models to Study Substitution Patterns in Primate Non-Coding Sequences

  • Guy Baele
  • Yves Van de Peer
  • Stijn Vansteelandt


We discuss the importance of non-reversible evolutionary models when analyzing context-dependence. Given the inherent non-reversible nature of the well-known CpG-methylation-deamination process in mammalian evolution, non-reversible context-dependent evolutionary models may be well able to accurately model such a process. In particular, the lack of constraints on non-reversible substitution models might allow for more accurate estimation of context-dependent substitution parameters. To demonstrate this, we have developed different time-homogeneous context-dependent evolutionary models to analyze a large genomic dataset of primate ancestral repeats based on existing independent evolutionary models. We have calculated the difference in model fit for each of these models using Bayes Factors obtained via thermodynamic integration. We find that non-reversible context-dependent models can drastically increase model fit when compared to independent models and this on two primate non-coding datasets. Further, we show that further improvements are possible by clustering similar parameters across contexts.


Context-dependent evolution Nearest-neighbor influences Context effect Bayes Factor Thermodynamic integration 



General time-reversible


Bayes Factor


Quasistatic estimate



Computational resources and services used in this study were provided by Ghent University. We would like to thank Stijn De Weirdt in particular for his assistance in using Ghent University’s High Performance Computing facility. We would also like to thank Hervé Philippe and one anonymous referee for providing useful comments on a first version. Yves Van de Peer acknowledges support from an Interuniversity Attraction Pole (IUAP) grant for the BioMaGNet project (Bioinformatics and Modelling: from Genomes to Networks, ref. p6/25). Stijn Vansteelandt acknowledges support from IAP research network Grant No. P06/03 from the Belgian government (Belgian Science Policy). We acknowledge the support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”).


  1. Baele G, Van de Peer Y, Vansteelandt S (2008) A model-based approach to study nearest-neighbor influences reveals complex substitution patterns in non-coding sequences. Syst Biol 57(5):675–692CrossRefPubMedGoogle Scholar
  2. Baele G, Van de Peer Y, Vansteelandt S (2009) Efficient context-dependent model building based on clustering posterior distributions for non-coding sequences. BMC Evol Biol 9:87CrossRefPubMedGoogle Scholar
  3. Baele G, Van de Peer Y, Vansteelandt S (2010) Modelling the ancestral sequence distribution and equilibrium frequencies in context-dependent models for primate non-coding sequences. BMC Evol Biol (submitted, under review)Google Scholar
  4. Balakirev ES, Ayala FJ (2003) Pseudogenes: are they “junk” or functional DNA? Annu Rev Genet 37:123–151CrossRefPubMedGoogle Scholar
  5. Blaisdell BE (1985) A method for estimating from two aligned present day DNA sequences their ancestral composition and subsequent rates of composition and subsequent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site. J Mol Evol 22:69–81CrossRefPubMedGoogle Scholar
  6. Blanchette M, Kent WJ, Riemer C, Elnitski L, Smit AFA, Roskin KM, Baertsch R, Rosenbloom K, Clawson H, Green ED, Haussler D, Miller W (2004) Aligning multiple genomic sequences with the Threaded Blockset Aligner. Genome Res 14:708–715CrossRefPubMedGoogle Scholar
  7. Fryxell KJ, Zuckerkandl E (2000) Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol 17(9):1371–1383PubMedGoogle Scholar
  8. Gojobori T, Ishii K, Nei M (1982) Estimation of the average number of nucleotide substitutions when the rate of substitution varies with nucleotide. J Mol Evol 18:414–423CrossRefPubMedGoogle Scholar
  9. Green PJ (1995) Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82:711–732CrossRefGoogle Scholar
  10. Green P, Ewing B, Miller W, Thomas PJ, NISC Comparative Sequencing Program, Green ED (2003) Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet 33:514–517CrossRefPubMedGoogle Scholar
  11. Hasegawa M, Kishino H, Yano T (1985) Dating the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 22:160–174CrossRefPubMedGoogle Scholar
  12. Holmquist R (1976) Solution to a gene divergence problem under arbitrary stable nucleotide transition probabilities. J Mol Evol 8:337–349CrossRefPubMedGoogle Scholar
  13. Huelsenbeck JP, Bollback JP, Levine AM (2002) Inferring the root of a phylogenetic tree. Syst Biol 51(1):32–43CrossRefPubMedGoogle Scholar
  14. Hwang DG, Green P (2004) Bayesian Markov chain Monte Carlo sequence analysis reveals varying neutral substitution patterns in mammalian evolution. Proc Natl Acad Sci USA 101:13994–14001CrossRefPubMedGoogle Scholar
  15. Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro MN (ed) Mammalian protein metabolism, vol III. Academic Press, New York, pp 21–132Google Scholar
  16. Jurka J (2000) RepBase Update: a database and an electronic journal of repetitive elements. Trends Genet 9:418–420CrossRefGoogle Scholar
  17. Kass RE, Raftery AE (1995) Bayes Factors. J Am Stat Assoc 90(430):773–795CrossRefGoogle Scholar
  18. Kimura M (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J Mol Evol 16:111–120CrossRefPubMedGoogle Scholar
  19. Kimura M (1981) Estimation of evolutionary differences between homologous nucleotide sequences. Proc Natl Acad Sci USA 78:454–458CrossRefPubMedGoogle Scholar
  20. Lanavé C, Preparata G, Saccone C, Serio G (1984) A new method for calculating evolutionary substitution rates. J Mol Evol 20:86–93CrossRefPubMedGoogle Scholar
  21. Lartillot N, Philippe H (2006) Computing Bayes Factors using thermodynamic integration. Syst Biol 55(2):195–207CrossRefPubMedGoogle Scholar
  22. Margulies EH, Blanchette M, NISC Comparative Sequencing Program, Haussler D, Green ED (2003) Identification and characterization of multi-species conserved sequences. Genome Res 13:2507–2518CrossRefPubMedGoogle Scholar
  23. Margulies EH, Chen CW, Green ED (2006) Differences between pair-wise and multi-sequence alignment methods affect vertebrate genome comparisons. Trends Genet 22(4):187–193CrossRefPubMedGoogle Scholar
  24. Mighell AJ, Smith NR, Robinson PA, Markham AF (2000) Vertebrate pseudogenes. FEBS Lett 468:109–114CrossRefPubMedGoogle Scholar
  25. Miyamoto MM, Slighton JL, Goodman M (1987) Phylogenetic relations of humans and African apes from DNA sequences in the ψη-globin region. Science 238:369–373CrossRefPubMedGoogle Scholar
  26. Morton BR, Oberholzer VM, Clegg MT (1997) The influence of specific neighboring bases on substitution bias in noncoding regions of the plant chloroplast genome. J Mol Evol 45:227–231CrossRefPubMedGoogle Scholar
  27. Pagel M, Meade A (2004) A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character-state data. Syst Biol 53(4):571–581CrossRefPubMedGoogle Scholar
  28. Rodrigue N, Philippe H, Lartillot N (2006) Assessing site-interdependent phylogenetic models of sequence evolution. Mol Biol Evol 23(9):1762–1775CrossRefPubMedGoogle Scholar
  29. Ronquist F, Deans AR (2010) Bayesian phylogenetics and its influence on insect systematic. Annu Rev Entomol 55:189–206CrossRefPubMedGoogle Scholar
  30. Schadt EE, Sinsheimer JS, Lange K (1998) Computational advances in maximum likelihood methods for molecular phylogeny. Genome Res 8:222–233PubMedGoogle Scholar
  31. Schafer JL (1997) Analysis of incomplete multivariate data. Chapman & Hall/CRC, LondonCrossRefGoogle Scholar
  32. Siepel A, Haussler D (2004) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Mol Biol Evol 21(3):468–488CrossRefPubMedGoogle Scholar
  33. Smit AFA, Hubley R, Green P (1996–2004) RepeatMasker Open-3.0.
  34. Steel M (2005) Should phylogenetic models be trying to ‘fit an elephant’? Trends Genet 21(6):307–309CrossRefPubMedGoogle Scholar
  35. Tajima F, Nei M (1984) Estimation of evolutionary distance between nucleotide sequences. Mol Biol Evol 1(3):269–285PubMedGoogle Scholar
  36. Takahata N, Kimura M (1981) A model of evolutionary base substitutions and its application with special reference to rapid change of pseudogenes. Genetics 98:641–657PubMedGoogle Scholar
  37. Tamura K, Nei M (1984) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10(3):512–526Google Scholar
  38. Yang Z (1994) Estimating the pattern of nucleotide substitution. J Mol Evol 39:105–111PubMedGoogle Scholar
  39. Yang Z (1996) Among-site rate variation and its impact on phylogenetic analyses. Trends Ecol Evol 11(9):367–372CrossRefGoogle Scholar
  40. Zwickl D, Holder M (2004) Model parameterization, prior distributions, and the general time-reversible model in Bayesian phylogenetics. Syst Biol 53(6):877–888CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Guy Baele
    • 1
    • 2
  • Yves Van de Peer
    • 1
    • 2
  • Stijn Vansteelandt
    • 3
  1. 1.Department of Plant Systems BiologyVIB, Ghent UniversityGhentBelgium
  2. 2.Bioinformatics and Evolutionary Genomics, Department of Molecular GeneticsGhent UniversityGhentBelgium
  3. 3.Department of Applied Mathematics and Computer ScienceGhent UniversityGhentBelgium

Personalised recommendations