Abstract
Many large-scale phylogenetic reconstruction methods attempt to solve hard optimization problems such as Maximum Parsimony (MP) and Maximum Likelihood (ML), but they are severely limited by the number of taxa that they can handle in a reasonable timeframe. A standard heuristic approach to this problem is the divide-and-conquer strategy: decompose the data set into smaller subsets, solve the subsets (i.e., use MP or ML on each subset to obtain trees), and then combine the solutions to the subsets into a solution for the original data set. This last step — combining given trees into a single tree — is known as supertree construction in computational phylogenetics. The traditional application of supertree methods is to combine existing, published phylogenies into a single phylogeny. Here, we study supertree construction in the context of divide-and-conquer methods for large-scale tree reconstruction. We study several divide-and-conquer approaches and demonstrate experimentally their advantage over the traditional supertree technique of Matrix Representation with Parsimony (MRP), and over global heuristics such as the parsimony ratchet. For the ten large biological data sets under investigation, our study shows that the techniques used for dividing the data set into subproblems as well as those used for merging them into a single solution influence the quality of the supertree construction strongly. In most cases, our merging technique — the Strict Consensus Merger — outperformed MRP with respect to MP scores and running time. Divide-and-conquer techniques are also a highly competitive alternative to global heuristics such as the parsimony ratchet, especially on the more challenging data sets.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Baum, B. R. 1992. Combining trees as a way of combining data sets for phylogenetic inference, and the desirability of combining gene trees. Taxon 41:3–10.
Berry, V., Jiang, T., Kearney, P., Li, M., and Wareham, H. T. 1999. Quartet Cleaning: improved algorithms and simulations. In J. Nesetril (ed.), Algorithms — Esa’99: 7th Annual European Symposium, Prague, Czech Republic, July 1999, Lecture Notes in Computer Science 1643:313–324. Springer-Verlag, Berlin.
Bininda-Emonds, O. R. P. 2003. MRP supertree construction in the consensus setting. In M. Janowitz, F.-J. Lapointe, F. R. McMorris, B. Mirkin, and F. S. Roberts (eds), Bioconsensus, pp. 231–242. American Mathematical Society, Providence, Rhode Island.
Bininda-Emonds, O. R. P., Gittleman, J. L., and Purvis, A. 1999. Building large trees by combining phylogenetic information: a complete phylogeny of the extant Carnivora (Mammalia). Biological Reviews 74:143–175.
Bininda-Emonds, O. R. P. and Sanderson, M. J. 2001. An assessment of the accuracy of MRP supertree construction. Systematic Biology 50:565–579.
Bodlaender, H., Fellows, M., and T. Warnow, T. 1992. Two strikes against perfect phylogeny. In Kuich, W. (ed.), Proceedings of the International Colloquium on Automata, Languages, and Programming Icalp’92, Lecture Notes in Computer Science 623:273–283. Springer-Verlag, Berlin.
Bonet, M. L., Steel, M., Warnow, T., and Yooseph, S. 1998. Better methods for solving parsimony and compatibility. Journal of Computational Biology 5:391–408.
Buneman, P. 1974. A characterization of rigid circuit graphs. Discrete Mathematics 9:205–212.
Chase, M. W., Soltis, D. E., Olmstead, R. G., Morgan, D., Les, D. H., Mishler, B. D., Duvall, M. R., Price, R. A., Hills, H. G., Qiu, Y. L., Kron, K. A., Rettig, J. H., Contl, E., Palmer, J. D., Manhart, J. R., Sytsma, K. J., Michaels, H. J., Kress, W. J., Karol, K. G., Clark, W. D., Hedren, M., Gaut, B. S., Jansen, R. K., Kim, K. J., Wimpee, C. F., Smith, J. F., Furnier, G. R., Strauss, S. H., Xiang, Q. Y., Plunkett, G. M., Soltis, P. S., Swensen, S. M., Williams, S. E., Gadek, P. A., Quinn, C. J., Eguiarte, L. E., Golenberg, E., Learn, G. H., Graham, S. W., Barrett, S. C. H., Dayanandan, S., and Albert, V. A. 1993. Phylogenetics of seed plants: an analysis of nucleotide sequences from the plastid gene rbcL. Annals of the Missouri Botanical Garden 80:528–580.
Erdös, P. L., Steel, M. A., Székely, L. A., and Warnow, T. J. 1999. A few logs suffice to build (almost) all trees (Part 1). Random Structures and Algorithms 14:153–184.
Felsenstein, J. 1981. Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution 17:368–376.
Foulds, L. R. and Graham, R. L. 1982. The Steiner problem in phylogeny is NP-complete. Advances in Applied Mathematics 3:43–49.
Goloboff, P. 1999. Analyzing large data sets in reasonable times: solution for composite optima. Cladistics 15:415–428.
Golumbic, M. 1980. Algorithmic Graph Theory and Perfect Graphs. Academic Press, New York.
Gordon, A. D. 1986. Consensus supertrees: the synthesis of rooted trees containing overlapping sets of labeled leaves. Journal of Classification 3:31–39.
Hillis, D. M., Moritz, C., and Mable, B. 1996. Molecular Systematics. Sinauer Associates, Sunderland, Massachusetts.
Huson, D., Nettles, S. and Warnow, T. 1999a. Disk-covering, a fast-converging method for phylogenetic tree reconstruction. Journal of Computational Biology 6:369–386.
Huson, D., Vawter, L., and Warnow, T. 1999b. Solving large scale phylogenetic problems using DCM2. In T. Lengauer, R. Schneider, P. Bork, D. Brutlag, J. Glasgow, H.-W. Mewes, and R. Zimmer (eds), Proceedings of the Seventh International Conference on Intelligent Systems for Molecular Biology, pp. 118–129. Aaai Press, Menlo Park, California.
Jones, K. E., Purvis, A., Maclarnon, A., Bininda-Emonds, O. R. P., and Simmons, N. B. 2002. A phylogenetic supertree of the bats (Mammalia: Chiroptera). Biological Reviews 77:223–259.
Kimura, M. 1980. A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111–120.
Lw, F.-G. R., Miyamoto, M. M., Freire, N. P., Ong, P. Q., Tennant, M. R., Young, T. S., and Gugel, K. F. 2001. Molecular and morphological supertrees for eutherian (placental) mammals. Science 291:1786–1789.
Mahon, A. S. 2004. A molecular supertree of the Artiodactyla. In O. R. P. Bininda-Emonds (ed.), Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, pp. 411–437. Kluwer Academic, Dordrecht, the Netherlands.
Maidak, B. L., Cole, J. R., Lilburn, T. G., Parker, C. P. Jr, Saxman, P. R., Farris, R. J., Garrity, G. M., Olsen, G. J., Schmidt, T. M., and Tiedje, J. M. 2001. The RDP-II (Ribosomal Database Project). Nucleic Acids Research 29:173–174.
Mishler, B. D. 1994. Cladistic analysis of molecular and morphological data. American Journal of Physical Anthropology 94:143–156.
Moret, B. M. E. 2002. Towards a discipline of experimental algorithmics. In M. H. Goldwasser, D. S. Johnson, and C. C. McGeoch (eds), Data Structures, Near Neighbor Searches, and Methodology: Fifth and Sixth Dimacs Implementation Challenges, pp. 197–213. American Mathematical Society, Providence, Rhode Island.
Moret, B. M. E., Roshan, U., and Warnow, T. 2002. Sequence length requirements for phylogenetic methods. In R. Guigó and D. Gusfield (eds), Algorithms in Bioinformatics, Second International Workshop, Wabi 2002, Rome, Italy, September 17–21, 2002, Proceedings, pp. 343–356. Springer, Berlin.
Nakhleh, L., Moret, B. M. E., Roshan, U., St. John, K., and Warnow, T. 2002. The accuracy of fast phylogenetic methods for large data sets. In R. B. Altman, A. K. Dunker, L. Hunter, and T. E. Klein (eds), Pacific Symposium on Biocomputing 2002, pp. 211–222. World Scientific Publishing Company, River Edge, New Jersey.
Nakhleh, L., Roshan, U., St. John, K., Sun, J., and Warnow, T. 2001. Designing fast converging phylogenetic methods. Bioinformatics 17:5190–5198.
Nixon, K. C. 1999. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15:407–414.
Olsen, G. J., Woese, C. R., and Overbeek, R. 1994. The winds of (evolutionary) change: breathing new life into microbiology. Journal ofBacteriology 176:1–6.
Posada, D. and Crandall, K. A. 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14:817–818.
Purvis, A. 1995. A composite estimate of primate phylogeny. Philosophical Transactions of the Royal Society London Series B 348:405–421.
Ragan, M. A. 1992. Phylogenetic inference based on matrix representation of trees. Molecular Phylogenetics and Evolution 1:53–58.
Rice, K. A., Donoghue, M. J., and Olmstead, R. G. 1997. Analyzing large data sets: rbcL 500 revisited. Systematic Biology 46:554–563.
Saitou, N. and Nei, M. 1987. The neighbor joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4:406–425.
Soltis, P. S., Soltis, D. E., Chase, M. W., Mort, M. E., Albach, D. C., Zanis, M. J., Savolainen, V., Hahn, W. H., Hoot, S. B., Fay, M. F., Axtell, D. C., Swenson, S. M., Prince, L. M., Kress, W. J., Nixon, K. C., and Farris, J. S. 2000. Angiosperm phylogeny inferred from a combined data set of 18S rDNA, rbcL, and atpB sequences. Botanical Journal of Linnean Society 133:381–461.
St. John, K., Warnow, T., Moret, B. M. E., and Vawter, L. 2001. Performance study of phylogenetic methods: (unweighted) quartet methods and neighbor joining. In S. R. Kosaraju (ed.), Symposium on Discrete Algorithms. Proceedings of the Twelfth Annual ACM-Siam Symposium on Discrete Algorithms, pp. 196–205. Society for Industrial and Applied Mathematics, Philadelphia, Pa.
Steel, M. A. 1994. The maximum likelihood point for a phylogenetic tree is not unique. Systematic Biology 43:560–564.
Strimmer, K. and Von Haeseler, A. 1996. Quartet puzzling: a quartet maximum-likelihood method for reconstructing tree topologies. Molecular Biology and Evolution 13:964–969.
Swofford, D. L. 2002. Pa Up *. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sinauer, Sunderland, Massachusetts.
Tang, J. and Moret, B. M. E. 2003. Scaling up accurate phylogenetic reconstruction from gene-order data. Bioinformatics 19:i305-i312.
Warnow, T., B. Moret, B. M. E. Moret, and St. John, K. 2001. Absolute convergence: true trees from short sequences. In S. R. Kosaraju (ed.), Symposium on Discrete Algorithms. Proceedings of the Twelfth Annual Acm-Siam Symposium on Discrete Algorithms, pp. 186–195. Society for Industrial and Applied Mathematics, Philadelphia, Pa.
Wuyts, J., Van De Peer, Y., Winkelmans, T., and Wachter, R. D. 2002. The European database on small subunit ribosomal RNA. Nucleic Acids Research 30:183–185.
Yang, Z. 1993. Maximum likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution 10:1396–1401.
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer Science+Business Media Dordrecht
About this chapter
Cite this chapter
Roshan, U., Moret, B.M.E., Williams, T.L., Warnow, T. (2004). Performance of Supertree Methods on Various Data Set Decompositions. In: Bininda-Emonds, O.R.P. (eds) Phylogenetic Supertrees. Computational Biology, vol 4. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-2330-9_15
Download citation
DOI: https://doi.org/10.1007/978-1-4020-2330-9_15
Publisher Name: Springer, Dordrecht
Print ISBN: 978-1-4020-2329-3
Online ISBN: 978-1-4020-2330-9
eBook Packages: Springer Book Archive