Skip to main content

Performance of Supertree Methods on Various Data Set Decompositions

  • Chapter
Book cover Phylogenetic Supertrees

Part of the book series: Computational Biology ((COBO,volume 4))

Abstract

Many large-scale phylogenetic reconstruction methods attempt to solve hard optimization problems such as Maximum Parsimony (MP) and Maximum Likelihood (ML), but they are severely limited by the number of taxa that they can handle in a reasonable timeframe. A standard heuristic approach to this problem is the divide-and-conquer strategy: decompose the data set into smaller subsets, solve the subsets (i.e., use MP or ML on each subset to obtain trees), and then combine the solutions to the subsets into a solution for the original data set. This last step — combining given trees into a single tree — is known as supertree construction in computational phylogenetics. The traditional application of supertree methods is to combine existing, published phylogenies into a single phylogeny. Here, we study supertree construction in the context of divide-and-conquer methods for large-scale tree reconstruction. We study several divide-and-conquer approaches and demonstrate experimentally their advantage over the traditional supertree technique of Matrix Representation with Parsimony (MRP), and over global heuristics such as the parsimony ratchet. For the ten large biological data sets under investigation, our study shows that the techniques used for dividing the data set into subproblems as well as those used for merging them into a single solution influence the quality of the supertree construction strongly. In most cases, our merging technique — the Strict Consensus Merger — outperformed MRP with respect to MP scores and running time. Divide-and-conquer techniques are also a highly competitive alternative to global heuristics such as the parsimony ratchet, especially on the more challenging data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Baum, B. R. 1992. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41:3–10.

    Article  Google Scholar 

  • Berry, V., Jiang, T., Kearney, P., Li, M., and Wareham, H. T. 1999. Quartet Cleaning: improved algorithms and simulations. In J. Nesetril (ed.), Algorithms — Esa’99: 7th Annual European Symposium, Prague, Czech Republic, July 1999, Lecture Notes in Computer Science 1643:313–324. Springer-Verlag, Berlin.

    Google Scholar 

  • Bininda-Emonds, O. R. P. 2003. MRP supertree construction in the consensus setting. In M. Janowitz, F.-J. Lapointe, F. R. McMorris, B. Mirkin, and F. S. Roberts (eds), Bioconsensus, pp. 231–242. American Mathematical Society, Providence, Rhode Island.

    Google Scholar 

  • Bininda-Emonds, O. R. P., Gittleman, J. L., and Purvis, A. 1999. Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biological Reviews 74:143–175.

    Article  PubMed  CAS  Google Scholar 

  • Bininda-Emonds, O. R. P. and Sanderson, M. J. 2001. An assessment of the accuracy of MRP supertree construction. Systematic Biology 50:565–579.

    Article  PubMed  CAS  Google Scholar 

  • Bodlaender, H., Fellows, M., and T. Warnow, T. 1992. Two strikes against perfect phylogeny. In Kuich, W. (ed.), Proceedings of the International Colloquium on Automata, Languages, and Programming Icalp’92, Lecture Notes in Computer Science 623:273–283. Springer-Verlag, Berlin.

    Chapter  Google Scholar 

  • Bonet, M. L., Steel, M., Warnow, T., and Yooseph, S. 1998. Better methods for solving parsimony and compatibility. Journal of Computational Biology 5:391–408.

    Article  PubMed  CAS  Google Scholar 

  • Buneman, P. 1974. A characterization of rigid circuit graphs. Discrete Mathematics 9:205–212.

    Article  Google Scholar 

  • Chase, M. W., Soltis, D. E., Olmstead, R. G., Morgan, D., Les, D. H., Mishler, B. D., Duvall, M. R., Price, R. A., Hills, H. G., Qiu, Y. L., Kron, K. A., Rettig, J. H., Contl, E., Palmer, J. D., Manhart, J. R., Sytsma, K. J., Michaels, H. J., Kress, W. J., Karol, K. G., Clark, W. D., Hedren, M., Gaut, B. S., Jansen, R. K., Kim, K. J., Wimpee, C. F., Smith, J. F., Furnier, G. R., Strauss, S. H., Xiang, Q. Y., Plunkett, G. M., Soltis, P. S., Swensen, S. M., Williams, S. E., Gadek, P. A., Quinn, C. J., Eguiarte, L. E., Golenberg, E., Learn, G. H., Graham, S. W., Barrett, S. C. H., Dayanandan, S., and Albert, V. A. 1993. Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80:528–580.

    Article  Google Scholar 

  • Erdös, P. L., Steel, M. A., Székely, L. A., and Warnow, T. J. 1999. A few logs suffice to build (almost) all trees (Part 1). Random Structures and Algorithms 14:153–184.

    Article  Google Scholar 

  • Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17:368–376.

    Article  PubMed  CAS  Google Scholar 

  • Foulds, L. R. and Graham, R. L. 1982. The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3:43–49.

    Article  Google Scholar 

  • Goloboff, P. 1999. Analyzing large data sets in reasonable times: solution for composite optima. Cladistics 15:415–428.

    Article  Google Scholar 

  • Golumbic, M. 1980. Algorithmic Graph Theory and Perfect Graphs. Academic Press, New York.

    Google Scholar 

  • Gordon, A. D. 1986. Consensus supertrees: the synthesis of rooted trees containing overlapping sets of labeled leaves. Journal of Classification 3:31–39.

    Article  Google Scholar 

  • Hillis, D. M., Moritz, C., and Mable, B. 1996. Molecular Systematics. Sinauer Associates, Sunderland, Massachusetts.

    Google Scholar 

  • Huson, D., Nettles, S. and Warnow, T. 1999a. Disk-covering, a fast-converging method for phylogenetic tree reconstruction. Journal of Computational Biology 6:369–386.

    Article  PubMed  CAS  Google Scholar 

  • Huson, D., Vawter, L., and Warnow, T. 1999b. Solving large scale phylogenetic problems using DCM2. In T. Lengauer, R. Schneider, P. Bork, D. Brutlag, J. Glasgow, H.-W. Mewes, and R. Zimmer (eds), Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pp. 118–129. Aaai Press, Menlo Park, California.

    Google Scholar 

  • Jones, K. E., Purvis, A., Maclarnon, A., Bininda-Emonds, O. R. P., and Simmons, N. B. 2002. A phylogenetic supertree of the bats (Mammalia: Chiroptera). Biological Reviews 77:223–259.

    Article  PubMed  Google Scholar 

  • Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111–120.

    Article  PubMed  CAS  Google Scholar 

  • Lw, F.-G. R., Miyamoto, M. M., Freire, N. P., Ong, P. Q., Tennant, M. R., Young, T. S., and Gugel, K. F. 2001. Molecular and morphological supertrees for eutherian (placental) mammals. Science 291:1786–1789.

    Article  Google Scholar 

  • Mahon, A. S. 2004. A molecular supertree of the Artiodactyla. In O. R. P. Bininda-Emonds (ed.), Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 411–437. Kluwer Academic, Dordrecht, the Netherlands.

    Google Scholar 

  • Maidak, B. L., Cole, J. R., Lilburn, T. G., Parker, C. P. Jr, Saxman, P. R., Farris, R. J., Garrity, G. M., Olsen, G. J., Schmidt, T. M., and Tiedje, J. M. 2001. The RDP-II (Ribosomal Database Project). Nucleic Acids Research 29:173–174.

    Article  PubMed  CAS  Google Scholar 

  • Mishler, B. D. 1994. Cladistic analysis of molecular and morphological data. American Journal of Physical Anthropology 94:143–156.

    Article  PubMed  CAS  Google Scholar 

  • Moret, B. M. E. 2002. Towards a discipline of experimental algorithmics. In M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch (eds), Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth Dimacs Implementation Challenges, pp. 197–213. American Mathematical Society, Providence, Rhode Island.

    Google Scholar 

  • Moret, B. M. E., Roshan, U., and Warnow, T. 2002. Sequence length requirements for phylogenetic methods. In R. Guigó and D. Gusfield (eds), Algorithms in Bioinformatics, Second International Workshop, Wabi 2002, Rome, Italy, September 17–21, 2002, Proceedings, pp. 343–356. Springer, Berlin.

    Google Scholar 

  • Nakhleh, L., Moret, B. M. E., Roshan, U., St. John, K., and Warnow, T. 2002. The accuracy of fast phylogenetic methods for large data sets. In R. B. Altman, A. K. Dunker, L. Hunter, and T. E. Klein (eds), Pacific Symposium on Biocomputing 2002, pp. 211–222. World Scientific Publishing Company, River Edge, New Jersey.

    Google Scholar 

  • Nakhleh, L., Roshan, U., St. John, K., Sun, J., and Warnow, T. 2001. Designing fast converging phylogenetic methods. Bioinformatics 17:5190–5198.

    Article  Google Scholar 

  • Nixon, K. C. 1999. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407–414.

    Google Scholar 

  • Olsen, G. J., Woese, C. R., and Overbeek, R. 1994. The winds of (evolutionary) change: breathing new life into microbiology. Journal ofBacteriology 176:1–6.

    CAS  Google Scholar 

  • Posada, D. and Crandall, K. A. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817–818.

    Article  PubMed  CAS  Google Scholar 

  • Purvis, A. 1995. A composite estimate of primate phylogeny. Philosophical Transactions of the Royal Society London Series B 348:405–421.

    Article  CAS  Google Scholar 

  • Ragan, M. A. 1992. Phylogenetic inference based on matrix representation of trees. Molecular Phylogenetics and Evolution 1:53–58.

    Article  PubMed  CAS  Google Scholar 

  • Rice, K. A., Donoghue, M. J., and Olmstead, R. G. 1997. Analyzing large data sets: rbcL 500 revisited. Systematic Biology 46:554–563.

    Article  PubMed  CAS  Google Scholar 

  • Saitou, N. and Nei, M. 1987. The neighbor joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4:406–425.

    PubMed  CAS  Google Scholar 

  • Soltis, P. S., Soltis, D. E., Chase, M. W., Mort, M. E., Albach, D. C., Zanis, M. J., Savolainen, V., Hahn, W. H., Hoot, S. B., Fay, M. F., Axtell, D. C., Swenson, S. M., Prince, L. M., Kress, W. J., Nixon, K. C., and Farris, J. S. 2000. Angiosperm phylogeny inferred from a combined data set of 18S rDNA, rbcL, and atpB sequences. Botanical Journal of Linnean Society 133:381–461.

    Google Scholar 

  • St. John, K., Warnow, T., Moret, B. M. E., and Vawter, L. 2001. Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor joining. In S. R. Kosaraju (ed.), Symposium on Discrete Algorithms. Proceedings of the Twelfth Annual ACM-Siam Symposium on Discrete Algorithms, pp. 196–205. Society for Industrial and Applied Mathematics, Philadelphia, Pa.

    Google Scholar 

  • Steel, M. A. 1994. The maximum likelihood point for a phylogenetic tree is not unique. Systematic Biology 43:560–564.

    Article  Google Scholar 

  • Strimmer, K. and Von Haeseler, A. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Molecular Biology and Evolution 13:964–969.

    Article  CAS  Google Scholar 

  • Swofford, D. L. 2002. Pa Up *. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer, Sunderland, Massachusetts.

    Google Scholar 

  • Tang, J. and Moret, B. M. E. 2003. Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics 19:i305-i312.

    Article  PubMed  Google Scholar 

  • Warnow, T., B. Moret, B. M. E. Moret, and St. John, K. 2001. Absolute convergence: true trees from short sequences. In S. R. Kosaraju (ed.), Symposium on Discrete Algorithms. Proceedings of the Twelfth Annual Acm-Siam Symposium on Discrete Algorithms, pp. 186–195. Society for Industrial and Applied Mathematics, Philadelphia, Pa.

    Google Scholar 

  • Wuyts, J., Van De Peer, Y., Winkelmans, T., and Wachter, R. D. 2002. The European database on small subunit ribosomal RNA. Nucleic Acids Research 30:183–185.

    Article  PubMed  CAS  Google Scholar 

  • Yang, Z. 1993. Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution 10:1396–1401.

    PubMed  CAS  Google Scholar 

Download references

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer Science+Business Media Dordrecht

About this chapter

Cite this chapter

Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T. (2004). Performance of Supertree Methods on Various Data Set Decompositions. In: Bininda-Emonds, O.R.P. (eds) Phylogenetic Supertrees. Computational Biology, vol 4. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-2330-9_15

Download citation

  • DOI: https://doi.org/10.1007/978-1-4020-2330-9_15

  • Publisher Name: Springer, Dordrecht

  • Print ISBN: 978-1-4020-2329-3

  • Online ISBN: 978-1-4020-2330-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics