Using Non-Reversible Context-Dependent Evolutionary Models to Study Substitution Patterns in Primate Non-Coding Sequences
- 125 Downloads
We discuss the importance of non-reversible evolutionary models when analyzing context-dependence. Given the inherent non-reversible nature of the well-known CpG-methylation-deamination process in mammalian evolution, non-reversible context-dependent evolutionary models may be well able to accurately model such a process. In particular, the lack of constraints on non-reversible substitution models might allow for more accurate estimation of context-dependent substitution parameters. To demonstrate this, we have developed different time-homogeneous context-dependent evolutionary models to analyze a large genomic dataset of primate ancestral repeats based on existing independent evolutionary models. We have calculated the difference in model fit for each of these models using Bayes Factors obtained via thermodynamic integration. We find that non-reversible context-dependent models can drastically increase model fit when compared to independent models and this on two primate non-coding datasets. Further, we show that further improvements are possible by clustering similar parameters across contexts.
KeywordsContext-dependent evolution Nearest-neighbor influences Context effect Bayes Factor Thermodynamic integration
Computational resources and services used in this study were provided by Ghent University. We would like to thank Stijn De Weirdt in particular for his assistance in using Ghent University’s High Performance Computing facility. We would also like to thank Hervé Philippe and one anonymous referee for providing useful comments on a first version. Yves Van de Peer acknowledges support from an Interuniversity Attraction Pole (IUAP) grant for the BioMaGNet project (Bioinformatics and Modelling: from Genomes to Networks, ref. p6/25). Stijn Vansteelandt acknowledges support from IAP research network Grant No. P06/03 from the Belgian government (Belgian Science Policy). We acknowledge the support of Ghent University (Multidisciplinary Research Partnership “Bioinformatics: from nucleotides to networks”).
- Baele G, Van de Peer Y, Vansteelandt S (2010) Modelling the ancestral sequence distribution and equilibrium frequencies in context-dependent models for primate non-coding sequences. BMC Evol Biol (submitted, under review)Google Scholar
- Blaisdell BE (1985) A method for estimating from two aligned present day DNA sequences their ancestral composition and subsequent rates of composition and subsequent rates of substitution, possibly different in the two lineages, corrected for multiple and parallel substitutions at the same site. J Mol Evol 22:69–81CrossRefPubMedGoogle Scholar
- Jukes TH, Cantor CR (1969) Evolution of protein molecules. In: Munro MN (ed) Mammalian protein metabolism, vol III. Academic Press, New York, pp 21–132Google Scholar
- Smit AFA, Hubley R, Green P (1996–2004) RepeatMasker Open-3.0. http://www.repeatmasker.org
- Tamura K, Nei M (1984) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10(3):512–526Google Scholar