Abstract
A major challenge in biological sciences is the reconstruction of the Tree of Life. To this effect, large genomic databases like GenBank and SwissProt are being mined for clusters from which phylogenies can be inferred. Systematists and comparative biologists commonly combine such phylogenies into informative supertrees that reveal information which was not explicitly displayed in any of the original phylogenies. However, whether a supertree is informative depends on particular overlap properties among the clusters from which it originates. In this work we formally introduce the concept of groves — sets of clusters with the potential to construct informative supertrees. Thus maximal potential candidate clusters for informative supertree construction can be identified in large databases through groves, prior to inferring trees for each cluster. Groves also have the potential to lead to informative supermatrix construction. We developed methods that (i) efficiently identify particular types of groves and (ii) find lower and upper bounds on the minimal number of groves needed to cover all the trees or data sets in a database. Finally, we apply our methods to the green plant sequences from GenBank.
Similar content being viewed by others
References
Aho A.V., Sagiv Y., Szymanski T.G., Ullman J.D.: Inferring a tree from lowest common ancestors with an application to the optimization of relational expressions. SIAM J. Comput. 10(3), 405–421 (1981)
Bininda-Emonds O.R.P.: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, Computational Biology, Vol. 4. Kluwer Academic Publishers, Dordrecht (2004)
Bininda-Emonds O.R.P., Gittleman J.L., Steel M.: The (super)tree of life: procedures, problems, and prospects. Annu. Rev. Ecol. Syst. 33, 265–289 (2002)
Bryant D.J., Steel M.: Extension operations on sets of leaf-labelled trees. Adv. Appl. Math. 16(4), 425–453 (1995)
Cotton J.A., Slater C.S.C., Wilkinson M.: Discriminating supported and unsupported relationships in supertrees using triplets. Syst. Biol. 55(2), 345–350 (2006)
Davies T.J., Barraclough T.G., Chase M.W., Soltis P.S., Soltis D.E., Savolainen V.: Darwin’s abominable mystery: insights from a supertree of the angiosperms. Proc. Natl. Acad. Sci. USA 101, 1904–1909 (2004)
Driskell A.C., Burleigh J.G., Burleigh J.G., McMahon M.M., O’Meara B.C., Sanderson M.J.: Prospects for building the tree of life from large sequence databases. Science 306, 1172–1174 (2004)
Foulds L.R., Graham R.L.: The Steiner problem in Phylogeny is NP-complete. Adv. Appl. Math. 3, 43–49 (1982)
Hall M. Jr.: Combinatorial Theory. John Wiley & Sons, New York (1986)
Kennedy M., Page R.D.M.: Seabird supertrees: combining partial estimates of procellariiform phylogeny. Auk 119(1), 88–108 (2002)
Liu F.-G., Miyamoto M.M., Freire N.P., Ong P.Q., Tennant M.R., Young T.S., Gugel K.F.: Molecular and morphological supertrees for eutherian (plancental) mammals. Science 291, 1786–1789 (2001)
R.D.M. Page, Phyloinformatics: towards a phylogenetic database, In: Data Mining in Bioinformatics, J.T.L.Wang, M.J. Zaki, H.T.T. Toivonen, and D.E. Shasha, Eds., Springer, Heidelberg, (2005) pp. 219–241.
R.D.M. Page, Taxonomy, supertrees, and the Tree of Life, In: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, O.R.P. Bininda-Emonds Ed., Kluwer Academic Publishers, Dordrecht, (2004) pp. 247–265.
Papadimitriou C.H.: Computational Complexity. Addison-Wesley, Reading, Massachusetts (1994)
Pisani D., Yates A.M., Langer M.C., Benton M.J.: A genus-level supertree of the Dinosauria. Proc. R. Soc. Lond. B 269, 915–921 (2002)
M.J. Sanderson, C. An´e, O. Eulenstein, D. Fern´andez-Baca, J. Kim, M.M. McMahon, and R. Piaggio-Talice, Fragmentation of large data sets in phylogenetic analyses, In: Reconstructing Evolution: New Mathematical and Computational Advances, Mike Steel and Olivier Gascuel, Eds., Oxford University Press, Oxford, (2007) pp. 199–216.
Sanderson M.J., Driskell A.C.: The challenge of constructing large phylogenetic trees. Trends Plant Sci. 8, 374–379 (2003)
Sanderson M.J., Purvis A., Henze C.: Phylogenetic supertrees: assembling the trees of life. Trends Ecol. Evol. 13(3), 105–109 (1998)
Semple C., Steel M.: Phylogenetics. Oxford University Press, New York (2003)
Steel M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. Classification 9, 91–116 (1992)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Ané, C., Eulenstein, O., Piaggio-Talice, R. et al. Groves of Phylogenetic Trees. Ann. Comb. 13, 139–167 (2009). https://doi.org/10.1007/s00026-009-0017-x
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00026-009-0017-x