Distribution of distances between topologies and its effect on detection of phylogenetic recombination

  • Leonardo de Oliveira MartinsEmail author
  • Hirohisa Kishino


Inferences about the evolutionary history of biological sequence data are greatly influenced by the presence of recombination, that tends to disrupt the phylogenetic signal. Current recombination detection procedures focus on the phylogenetic disagreement of the data along the aligned sequences, but only recently the link between the quantification of this disagreement and the strength of the recombination was realised. We previously described a hierarchical Bayesian procedure based on the distance between topologies of neighbouring sites and a Poisson-like prior for these distances. Here, we confirm the improvement provided by this topology distance and its prior over existing methods that neglect this information by analysing datasets simulated under a complex evolutionary model. We also show how to obtain a mosaic structure representative of the posterior sample based on a newly developed centroid method.


Viral recombination SPR distance Markov chain Monte Carlo Phylogenetics 


  1. Akaike H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6): 716–723zbMATHCrossRefMathSciNetGoogle Scholar
  2. Al-Awadhi F., Hurn M., Jennison C. (2004) Improving the acceptance rate of reversible jump MCMC proposals. Statistics and Probability Letters 69(2): 189–198zbMATHCrossRefMathSciNetGoogle Scholar
  3. Allen B., Steel M. (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5(1): 1–15CrossRefMathSciNetGoogle Scholar
  4. Altekar G., Dwarkadas S., Huelsenbeck J.P., Ronquist F. (2004) Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3): 407–415CrossRefGoogle Scholar
  5. Awadalla P. (2003) The evolutionary genomics of pathogen recombination. Nature Reviews Genetics 4(1): 50–60CrossRefGoogle Scholar
  6. Beiko R.G., Hamilton N. (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6: 15CrossRefGoogle Scholar
  7. Carvalho L.E., Lawrence C.E. (2008) Centroid estimation in discrete high-dimensional spaces with applications in biology. Proceedings of the National Academy of Sciences USA 105(9): 3209–3214CrossRefGoogle Scholar
  8. Dimatteo I., Genovese C., Kass R. (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88(4): 1055–1071zbMATHCrossRefMathSciNetGoogle Scholar
  9. Ding Y., Chan C.Y., Lawrence C.E. (2005) Rna secondary structure prediction by centroids in a boltzmann weighted ensemble. RNA 11(8): 1157–1166CrossRefGoogle Scholar
  10. Fang F., Ding J., Minin V.N., Suchard M.A., Dorman K.S. (2007) cBrother: relaxing parental tree assumptions for Bayesian recombination detection. Bioinformatics 23(4): 507–508CrossRefGoogle Scholar
  11. Felsenstein J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17(6): 368–376CrossRefGoogle Scholar
  12. Felsenstein J. (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MAGoogle Scholar
  13. Gelman A. (2004) Parameterization and Bayesian modeling. Journal of the American Statistical Association 99(466): 537–545zbMATHCrossRefMathSciNetGoogle Scholar
  14. Gelman A., Carlin J.B., Stern H.S., Rubin D.B. (2003) Bayesian data analysis (2nd ed). Boca Raton: FL, Chapman & Hall/CRCGoogle Scholar
  15. Hasegawa M., Kishino H., Yano T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22(2): 160–174CrossRefGoogle Scholar
  16. Kass R.E., Raftery A.E. (1995) Bayes Factors. Journal of the American Statistical Association 90(430): 773–795zbMATHCrossRefGoogle Scholar
  17. Minin V.N., Dorman K.S., Fang F., Suchard M.A. (2005) Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 21(13): 3034–3042CrossRefGoogle Scholar
  18. Mitchell T.J., Beauchamp J.J. (1988) Bayesian variable selection in linear regression. Journal of the American Statistical Association 83(404): 1023–1032zbMATHCrossRefMathSciNetGoogle Scholar
  19. de Oliveira Martins L., Leal É., Kishino H. (2008) Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS ONE 3(7): e2651CrossRefGoogle Scholar
  20. Posada D. (2002) Evaluation of methods for detecting recombination from dna sequences: empirical data. Molecular Biology and Evolution 19: 708–717Google Scholar
  21. Posada D., Buckley T. (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53(5): 793–808CrossRefGoogle Scholar
  22. Song Y. (2003) On the combinatorics of rooted binary phylogenetic trees. Annals of Combinatorics 7(3): 365–379zbMATHCrossRefMathSciNetGoogle Scholar
  23. Spiegelhalter D., Best N., Carlin B., van der Linde A. (2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B 64(4): 583–639zbMATHCrossRefMathSciNetGoogle Scholar
  24. Suchard M., Weiss R., Dorman K., Sinsheimer J. (2003) Inferring spatial phylogenetic variation along nucleotide sequences: a multiple changepoint model. Journal of the American Statistical Association 98(462): 427–438zbMATHCrossRefMathSciNetGoogle Scholar
  25. Tavaré S. (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura R.M. (eds) Some Mathematical Questions in Biology—DNA Sequence Analysis. Providence, AMS Bookstore, pp 57–86Google Scholar
  26. Webb-Robertson B.J.M., McCue L.A., Lawrence C.E. (2008) Measuring global credibility with application to local sequence alignment. PLoS Computational Biology 4(5): e1000077CrossRefMathSciNetGoogle Scholar
  27. Yang Z. (1993) Maximum-likelihood estimation of phylogeny from dna sequences when substitution rates differ over sites. Molecular Biology and Evolution 10(6): 1396–1401Google Scholar
  28. Yang Z. (1994) Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution 39(1): 105–111CrossRefGoogle Scholar
  29. Yang Z. (1994) Maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39(3): 306–314CrossRefGoogle Scholar
  30. Yang Z. (2007) Paml 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24(8): 1586–1591CrossRefGoogle Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2009

Authors and Affiliations

  • Leonardo de Oliveira Martins
    • 1
    Email author
  • Hirohisa Kishino
    • 2
  1. 1.Department of Biochemistry, Genetics and Immunology, Faculty of BiologyUniversity of VigoVigoSpain
  2. 2.Graduate School of Agriculture and Life SciencesUniversity of TokyoTokyoJapan

Personalised recommendations