Abstract
Inferences about the evolutionary history of biological sequence data are greatly influenced by the presence of recombination, that tends to disrupt the phylogenetic signal. Current recombination detection procedures focus on the phylogenetic disagreement of the data along the aligned sequences, but only recently the link between the quantification of this disagreement and the strength of the recombination was realised. We previously described a hierarchical Bayesian procedure based on the distance between topologies of neighbouring sites and a Poisson-like prior for these distances. Here, we confirm the improvement provided by this topology distance and its prior over existing methods that neglect this information by analysing datasets simulated under a complex evolutionary model. We also show how to obtain a mosaic structure representative of the posterior sample based on a newly developed centroid method.
Similar content being viewed by others
References
Akaike H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6): 716–723
Al-Awadhi F., Hurn M., Jennison C. (2004) Improving the acceptance rate of reversible jump MCMC proposals. Statistics and Probability Letters 69(2): 189–198
Allen B., Steel M. (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5(1): 1–15
Altekar G., Dwarkadas S., Huelsenbeck J.P., Ronquist F. (2004) Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3): 407–415
Awadalla P. (2003) The evolutionary genomics of pathogen recombination. Nature Reviews Genetics 4(1): 50–60
Beiko R.G., Hamilton N. (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6: 15
Carvalho L.E., Lawrence C.E. (2008) Centroid estimation in discrete high-dimensional spaces with applications in biology. Proceedings of the National Academy of Sciences USA 105(9): 3209–3214
Dimatteo I., Genovese C., Kass R. (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88(4): 1055–1071
Ding Y., Chan C.Y., Lawrence C.E. (2005) Rna secondary structure prediction by centroids in a boltzmann weighted ensemble. RNA 11(8): 1157–1166
Fang F., Ding J., Minin V.N., Suchard M.A., Dorman K.S. (2007) cBrother: relaxing parental tree assumptions for Bayesian recombination detection. Bioinformatics 23(4): 507–508
Felsenstein J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17(6): 368–376
Felsenstein J. (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA
Gelman A. (2004) Parameterization and Bayesian modeling. Journal of the American Statistical Association 99(466): 537–545
Gelman A., Carlin J.B., Stern H.S., Rubin D.B. (2003) Bayesian data analysis (2nd ed). Boca Raton: FL, Chapman & Hall/CRC
Hasegawa M., Kishino H., Yano T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22(2): 160–174
Kass R.E., Raftery A.E. (1995) Bayes Factors. Journal of the American Statistical Association 90(430): 773–795
Minin V.N., Dorman K.S., Fang F., Suchard M.A. (2005) Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 21(13): 3034–3042
Mitchell T.J., Beauchamp J.J. (1988) Bayesian variable selection in linear regression. Journal of the American Statistical Association 83(404): 1023–1032
de Oliveira Martins L., Leal É., Kishino H. (2008) Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS ONE 3(7): e2651
Posada D. (2002) Evaluation of methods for detecting recombination from dna sequences: empirical data. Molecular Biology and Evolution 19: 708–717
Posada D., Buckley T. (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53(5): 793–808
Song Y. (2003) On the combinatorics of rooted binary phylogenetic trees. Annals of Combinatorics 7(3): 365–379
Spiegelhalter D., Best N., Carlin B., van der Linde A. (2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B 64(4): 583–639
Suchard M., Weiss R., Dorman K., Sinsheimer J. (2003) Inferring spatial phylogenetic variation along nucleotide sequences: a multiple changepoint model. Journal of the American Statistical Association 98(462): 427–438
Tavaré S. (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura R.M. (eds) Some Mathematical Questions in Biology—DNA Sequence Analysis. Providence, AMS Bookstore, pp 57–86
Webb-Robertson B.J.M., McCue L.A., Lawrence C.E. (2008) Measuring global credibility with application to local sequence alignment. PLoS Computational Biology 4(5): e1000077
Yang Z. (1993) Maximum-likelihood estimation of phylogeny from dna sequences when substitution rates differ over sites. Molecular Biology and Evolution 10(6): 1396–1401
Yang Z. (1994) Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution 39(1): 105–111
Yang Z. (1994) Maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39(3): 306–314
Yang Z. (2007) Paml 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24(8): 1586–1591
Author information
Authors and Affiliations
Corresponding author
About this article
Cite this article
de Oliveira Martins, L., Kishino, H. Distribution of distances between topologies and its effect on detection of phylogenetic recombination. Ann Inst Stat Math 62, 145–159 (2010). https://doi.org/10.1007/s10463-009-0259-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-009-0259-8