Skip to main content
Log in

Distribution of distances between topologies and its effect on detection of phylogenetic recombination

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Inferences about the evolutionary history of biological sequence data are greatly influenced by the presence of recombination, that tends to disrupt the phylogenetic signal. Current recombination detection procedures focus on the phylogenetic disagreement of the data along the aligned sequences, but only recently the link between the quantification of this disagreement and the strength of the recombination was realised. We previously described a hierarchical Bayesian procedure based on the distance between topologies of neighbouring sites and a Poisson-like prior for these distances. Here, we confirm the improvement provided by this topology distance and its prior over existing methods that neglect this information by analysing datasets simulated under a complex evolutionary model. We also show how to obtain a mosaic structure representative of the posterior sample based on a newly developed centroid method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Akaike H. (1974) A new look at the statistical model identification. IEEE Transactions on Automatic Control 19(6): 716–723

    Article  MATH  MathSciNet  Google Scholar 

  • Al-Awadhi F., Hurn M., Jennison C. (2004) Improving the acceptance rate of reversible jump MCMC proposals. Statistics and Probability Letters 69(2): 189–198

    Article  MATH  MathSciNet  Google Scholar 

  • Allen B., Steel M. (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Annals of Combinatorics 5(1): 1–15

    Article  MathSciNet  Google Scholar 

  • Altekar G., Dwarkadas S., Huelsenbeck J.P., Ronquist F. (2004) Parallel Metropolis coupled Markov chain Monte Carlo for Bayesian phylogenetic inference. Bioinformatics 20(3): 407–415

    Article  Google Scholar 

  • Awadalla P. (2003) The evolutionary genomics of pathogen recombination. Nature Reviews Genetics 4(1): 50–60

    Article  Google Scholar 

  • Beiko R.G., Hamilton N. (2006) Phylogenetic identification of lateral genetic transfer events. BMC Evolutionary Biology 6: 15

    Article  Google Scholar 

  • Carvalho L.E., Lawrence C.E. (2008) Centroid estimation in discrete high-dimensional spaces with applications in biology. Proceedings of the National Academy of Sciences USA 105(9): 3209–3214

    Article  Google Scholar 

  • Dimatteo I., Genovese C., Kass R. (2001) Bayesian curve-fitting with free-knot splines. Biometrika 88(4): 1055–1071

    Article  MATH  MathSciNet  Google Scholar 

  • Ding Y., Chan C.Y., Lawrence C.E. (2005) Rna secondary structure prediction by centroids in a boltzmann weighted ensemble. RNA 11(8): 1157–1166

    Article  Google Scholar 

  • Fang F., Ding J., Minin V.N., Suchard M.A., Dorman K.S. (2007) cBrother: relaxing parental tree assumptions for Bayesian recombination detection. Bioinformatics 23(4): 507–508

    Article  Google Scholar 

  • Felsenstein J. (1981) Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17(6): 368–376

    Article  Google Scholar 

  • Felsenstein J. (2004) Inferring phylogenies. Sinauer Associates, Sunderland, MA

    Google Scholar 

  • Gelman A. (2004) Parameterization and Bayesian modeling. Journal of the American Statistical Association 99(466): 537–545

    Article  MATH  MathSciNet  Google Scholar 

  • Gelman A., Carlin J.B., Stern H.S., Rubin D.B. (2003) Bayesian data analysis (2nd ed). Boca Raton: FL, Chapman & Hall/CRC

    Google Scholar 

  • Hasegawa M., Kishino H., Yano T. (1985) Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22(2): 160–174

    Article  Google Scholar 

  • Kass R.E., Raftery A.E. (1995) Bayes Factors. Journal of the American Statistical Association 90(430): 773–795

    Article  MATH  Google Scholar 

  • Minin V.N., Dorman K.S., Fang F., Suchard M.A. (2005) Dual multiple change-point model leads to more accurate recombination detection. Bioinformatics 21(13): 3034–3042

    Article  Google Scholar 

  • Mitchell T.J., Beauchamp J.J. (1988) Bayesian variable selection in linear regression. Journal of the American Statistical Association 83(404): 1023–1032

    Article  MATH  MathSciNet  Google Scholar 

  • de Oliveira Martins L., Leal É., Kishino H. (2008) Phylogenetic detection of recombination with a Bayesian prior on the distance between trees. PLoS ONE 3(7): e2651

    Article  Google Scholar 

  • Posada D. (2002) Evaluation of methods for detecting recombination from dna sequences: empirical data. Molecular Biology and Evolution 19: 708–717

    Google Scholar 

  • Posada D., Buckley T. (2004) Model selection and model averaging in phylogenetics: advantages of Akaike information criterion and Bayesian approaches over likelihood ratio tests. Systematic Biology 53(5): 793–808

    Article  Google Scholar 

  • Song Y. (2003) On the combinatorics of rooted binary phylogenetic trees. Annals of Combinatorics 7(3): 365–379

    Article  MATH  MathSciNet  Google Scholar 

  • Spiegelhalter D., Best N., Carlin B., van der Linde A. (2002) Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society Series B 64(4): 583–639

    Article  MATH  MathSciNet  Google Scholar 

  • Suchard M., Weiss R., Dorman K., Sinsheimer J. (2003) Inferring spatial phylogenetic variation along nucleotide sequences: a multiple changepoint model. Journal of the American Statistical Association 98(462): 427–438

    Article  MATH  MathSciNet  Google Scholar 

  • Tavaré S. (1986) Some probabilistic and statistical problems in the analysis of DNA sequences. In: Miura R.M. (eds) Some Mathematical Questions in Biology—DNA Sequence Analysis. Providence, AMS Bookstore, pp 57–86

    Google Scholar 

  • Webb-Robertson B.J.M., McCue L.A., Lawrence C.E. (2008) Measuring global credibility with application to local sequence alignment. PLoS Computational Biology 4(5): e1000077

    Article  MathSciNet  Google Scholar 

  • Yang Z. (1993) Maximum-likelihood estimation of phylogeny from dna sequences when substitution rates differ over sites. Molecular Biology and Evolution 10(6): 1396–1401

    Google Scholar 

  • Yang Z. (1994) Estimating the pattern of nucleotide substitution. Journal of Molecular Evolution 39(1): 105–111

    Article  Google Scholar 

  • Yang Z. (1994) Maximum likelihood phylogenetic estimation from dna sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39(3): 306–314

    Article  Google Scholar 

  • Yang Z. (2007) Paml 4: phylogenetic analysis by maximum likelihood. Molecular Biology and Evolution 24(8): 1586–1591

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Leonardo de Oliveira Martins.

About this article

Cite this article

de Oliveira Martins, L., Kishino, H. Distribution of distances between topologies and its effect on detection of phylogenetic recombination. Ann Inst Stat Math 62, 145–159 (2010). https://doi.org/10.1007/s10463-009-0259-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-009-0259-8

Keywords

Navigation