Journal of Molecular Evolution

, Volume 61, Issue 3, pp 351–359

Biases in Phylogenetic Estimation Can Be Caused by Random Sequence Segments

Article

DOI: 10.1007/s00239-004-0352-9

Cite this article as:
Susko, E., Spencer, M. & Roger, A.J. J Mol Evol (2005) 61: 351. doi:10.1007/s00239-004-0352-9

Abstract

We consider the effects of fully or partially random sequences on the estimation of four-taxon phylogenies. Fully or partially random sequences occur when whole subsets of sequences or some sites for subsets of sequences are independent of sequence data for the other taxa. Random sequences can be a consequence of misalignment or because sites evolve at very fast rates in some portions of a tree, a situation that occurs especially in analyses involving deep divergence times. One might reasonably speculate that random sites will only add noise to the estimation of a phylogeny. We show that in the case that a random sequence is added to a three-taxa alignment, it is more likely to be a neighbor of the sequence corresponding to the longest branch in the three-taxon tree. Surprisingly, when only about half of the sites show randomness, a long-branch-repels form of small sample bias occurs, and when a minority of sites show randomness this becomes a long-branch-attraction bias again. The most serious bias, one that does not vanish with increasing sequence length, occurs when more than one sequence is partially random. If there is a large amount of overlap in the random sites for two sequences, those two sequences will be attracted to each other; otherwise, they will repel each other. Random sequences or sites can, therefore, cause complicated biases in phylogenetic inference. We suggest performing analyses with and without potentially saturated sequences and/or misaligned sites, to check that these biases are not affecting the inferred branching pattern.

Keywords

Biased estimation Long branch attraction Phylogeny Random sequences 

Copyright information

© Springer Science+Business Media, Inc. 2005

Authors and Affiliations

  • Edward Susko
    • 1
  • Mathew Spencer
    • 1
    • 2
  • Andrew J. Roger
    • 3
  1. 1.Genome Atlantic, Department of Mathematics and StatisticsDalhousie UniversityHalifaxCanada
  2. 2.Genome Atlantic, Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxCanada
  3. 3.Genome Atlantic, Canadian Institute for Advanced Research, Program in Evolutionary Biology, Department of Biochemistry and Molecular BiologyDalhousie UniversityHalifaxCanada

Personalised recommendations