Sequence-Length Requirements for Phylogenetic Methods

  • Bernard M.E. Moret
  • Usman Roshan
  • Tandy Warnow
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2452)


We study the sequence lengths required by neighbor-joining, greedy parsimony, and a phylogenetic reconstruction method (DCM NJ +MP) based on disk-covering and the maximum parsimony criterion. We use extensive simulations based on random birth-death trees, with controlled deviations from ultrametricity, to collect data on the scaling of sequence-length requirements for each of the three methods as a function of the number of taxa, the rate of evolution on the tree, and the deviation from ultrametricity. Our experiments show that DCM NJ +MP has consistently lower sequence-length requirements than the other two methods when trees of high topological accuracy are desired, although all methods require much longer sequences as the deviation from ultrametricity or the height of the tree grows. Our study has significant implications for large-scale phylogenetic reconstruction (where sequencelength requirements are a crucial factor), but also for future performance analyses in phylogenetics (since deviations from ultrametricity are proving pivotal).


Model Tree Sequence Length Phylogenetic Method Expected Deviation True Tree 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    K. Atteson. The performance of the neighbor-joining methods of phylogenetic reconstruction. Algorithmica, 25:251–278, 1999.zbMATHCrossRefMathSciNetGoogle Scholar
  2. 2.
    O.R.P. Bininda-Emonds, S.G. Brady, J. Kim, and M.J. Sanderson. Scaling of accuracy in extremely large phylogenetic trees. In Proc. 6th Pacific Symp. Biocomputing PSB 2002, pages 547–558. World Scientific Pub., 2001.Google Scholar
  3. 3.
    W. J. Bruno, N. Socci, and A. L. Halpern. Weighted neighbor joining: A likelihoodbased approach to distance-based phylogeny reconstruction. Mol. Biol. Evol., 17(1):189–197, 2000.Google Scholar
  4. 4.
    M. Csűrös. Fast recovery of evolutionary trees with thousands of nodes. To appear in RECOMB 01, 2001.Google Scholar
  5. 5.
    M. Csűrös and M. Y. Kao. Recovering evolutionary trees through harmonic greedy triplets. Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA 99), pages 261–270, 1999.Google Scholar
  6. 6.
    P. L. Erdős, M. Steel, L. Székély, and T. Warnow. A few logs suffice to build almost all trees-I. Random Structures and Algorithms, 14:153–184, 1997.Google Scholar
  7. 7.
    P. L. Erdős, M. Steel, L. Székély, and T. Warnow. A few logs suffice to build almost all trees-II. Theor. Comp. Sci., 221:77–118, 1999.CrossRefGoogle Scholar
  8. 8.
    L. R. Foulds and R. L. Graham. The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics, 3:43–49, 1982.zbMATHCrossRefMathSciNetGoogle Scholar
  9. 9.
    J. Huelsenbeck. Performance of phylogenetic methods in simulation. Syst. Biol., 44:17–48, 1995.CrossRefGoogle Scholar
  10. 10.
    J. Huelsenbeck and D. Hillis. Success of phylogenetic methods in the four-taxon case. Syst. Biol., 42:247–264, 1993.CrossRefGoogle Scholar
  11. 11.
    D. Huson, S. Nettles, and T. Warnow. Disk-covering, a fast-converging method for phylogenetic tree reconstruction. Comput. Biol., 6:369–386, 1999.CrossRefGoogle Scholar
  12. 12.
    D. Huson, K. A. Smith, and T. Warnow. Correcting large distances for phylogenetic reconstruction. In Proceedings of the 3rd Workshop on Algorithms Engineering (WAE), 1999. London, England.Google Scholar
  13. 13.
    M. Kimura. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16:111–120, 1980.CrossRefGoogle Scholar
  14. 14.
    K. Kuhner and J. Felsenstein. A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates. Mol. Biol. Evol., 11:459–468, 1994.Google Scholar
  15. 15.
    L. Nakhleh, B.M.E. Moret, U. Roshan, K. St. John, and T. Warnow. The accuracy of fast phylogenetic methods for large datasets. In Proc. 7th Pacific Symp. Biocomputing PSB 2002, pages 211–222. World Scientific Pub., 2002.Google Scholar
  16. 16.
    L. Nakhleh, U. Roshan, K. St. John, J. Sun, and T. Warnow. Designing fast converging phylogenetic methods. In Proc. 9th Int’l Conf. on Intelligent Systems for Molecular Biology (ISMB01), volume 17 of Bioinformatics, pages S190–S198. Oxford U. Press, 2001.Google Scholar
  17. 17.
    L. Nakhleh, U. Roshan, K. St. John, J. Sun, and T. Warnow. The performance of phylogenetic methods on trees of bounded diameter. In O. Gascuel and B.M.E. Moret, editors, Proc. 1st Int’l Workshop Algorithms in Bioinformatics (WABI’01), pages 214–226. Springer-Verlag, 2001.Google Scholar
  18. 18.
    A. Rambaut and N. C. Grassly. Seq-gen: An application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comp. Appl. Biosci., 13:235–238, 1997.Google Scholar
  19. 19.
    B. Rannala, J. P. Huelsenbeck, Z. Yang, and R. Nielsen. Taxon sampling and the accuracy of large phylogenies. Syst. Biol., 47(4):702–719, 1998.CrossRefGoogle Scholar
  20. 20.
    D. F. Robinson and L. R. Foulds. Comparison of phylogenetic trees. Mathematical Biosciences, 53:131–147, 1981.zbMATHCrossRefMathSciNetGoogle Scholar
  21. 21.
    N. Saitou and M. Nei. The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol. Biol. Evol., 4:406–425, 1987.Google Scholar
  22. 22.
    M.J. Sanderson. r8s software package. Available from
  23. 23.
    M.J. Sanderson, B.G. Baldwin, G. Bharathan, C.S. Campbell, D. Ferguson, J.M. Porter, C. Von Dohlen, M.F. Wojciechowski, and M.J. Donoghue. The growth of phylogenetic information and the need for a phylogenetic database. Systematic Biology, 42:562–568, 1993.CrossRefGoogle Scholar
  24. 24.
    T. Warnow, B. Moret, and K. St. John. Absolute convergence: true trees from short sequences. Proceedings of ACM-SIAM Symposium on Discrete Algorithms (SODA 01), pages 186–195, 2001.Google Scholar
  25. 25.
    Z. Yang. Maximum likelihood estimation of phylogeny from DNA sequences whensubstitution rates differ over sites. Mol. Biol. Evol., 10:1396–1401, 1993.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2002

Authors and Affiliations

  • Bernard M.E. Moret
    • 1
  • Usman Roshan
    • 2
  • Tandy Warnow
    • 2
  1. 1.Department of Computer ScienceUniversity of New MexicoAlbuquerque
  2. 2.Department of Computer SciencesUniversity of TexasAustin

Personalised recommendations