Enhancing Searches for Optimal Trees Using SIESTA

  • Pranjal Vachaspati
  • Tandy Warnow
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)

Abstract

Many supertree estimation and multi-locus species tree estimation methods compute trees by combining trees on subsets of the species set based on some NP-hard optimization criterion. A recent approach to computing large trees has been to constrain the search space by defining a set of “allowed bipartitions”, and then use dynamic programming to find provably optimal solutions in polynomial time. Several phylogenomic estimation methods, such as ASTRAL, the MDC algorithm in PhyloNet, and FastRFS, use this approach. We present SIESTA, a method that allows the dynamic programming method to return a data structure that compactly represents all the optimal trees in the search space. As a result, SIESTA provides multiple capabilities, including: (1) counting the number of optimal trees, (2) calculating consensus trees, (3) generating a random optimal tree, and (4) annotating branches in a given optimal tree by the proportion of optimal trees it appears in. SIESTA is available in open source form on github at https://github.com/pranjalv123/SIESTA.

Notes

Acknowledgments

We thank the anonymous reviewers for their helpful criticisms on an earlier draft, which greatly improved the manuscript. We also thank Erin Molloy, Sarah Christensen, and Siavash Mirarab, for feedback on the initial results.

Funding. This study made use of the Illinois Campus Cluster, a computing resource that is operated by the Illinois Campus Cluster Program in conjunction with the National Center for Supercomputing Applications and which is supported by funds from the University of Illinois at Urbana-Champaign. This work was partially supported by U.S. National Science Foundation Graduate Research Fellowship Program under Grant Number DGE-1144245 to PV and U.S. National Science Foundation grant CCF-1535977 to TW.

References

  1. 1.
    Alvarado-Serrano, D.F., D’Elía, G.: A new genus for the Andean mice Akodon latebricola and A. bogotensis (Rodentia: Sigmodontinae). J. Mammal. 94(5), 995–1015 (2013)CrossRefGoogle Scholar
  2. 2.
    Bayzid, M.S., Mirarab, S., Warnow, T.J.: Inferring optimal species trees under gene duplication and loss. In: Pacific Symposium Biocomputing, vol. 18, pp. 250–261 (2013)Google Scholar
  3. 3.
    Bininda-Emonds, O.R.: Phylogenetic Supertrees: Combining Information to Reveal the Tree of Life, vol. 4. Springer Science & Business Media, Dordrecht (2004). doi: 10.1007/978-1-4020-2330-9 MATHGoogle Scholar
  4. 4.
    Bryant, D., Steel, M.: Constructing optimal trees from quartets. J. Algorithms 38(1), 237–259 (2001)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009). http://mbe.oxfordjournals.org/content/26/8/1879.abstract CrossRefGoogle Scholar
  6. 6.
    González-Ittig, R.E., Rivera, P.C., Levis, S.C., Calderón, G.E., Gardenal, C.N.: The molecular phylogenetics of the genus Oligoryzomys (Rodentia: Cricetidae) clarifies rodent host-hantavirus associations. Zool. J. Linn. Soc. 171(2), 457–474 (2014)CrossRefGoogle Scholar
  7. 7.
    Hallett, M.T., Lagergren, J.: New algorithms for the duplication-loss model. In: Proceedings of the Fourth Annual International Conference on Computational Molecular Biology (RECOMB), pp. 138–146. ACM (2000)Google Scholar
  8. 8.
    Larget, B.R., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)CrossRefGoogle Scholar
  9. 9.
    Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)CrossRefGoogle Scholar
  10. 10.
    Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 1–18 (2010). doi: 10.1186/1471-2148-10-302 CrossRefGoogle Scholar
  11. 11.
    Machado, L.F., Leite, Y.L., Christoff, A.U., Giugliano, L.G.: Phylogeny and biogeography of tetralophodont rodents of the tribe Oryzomyini (Cricetidae: Sigmodontinae). Zoolog. Scr. 43(2), 119–130 (2014)CrossRefGoogle Scholar
  12. 12.
    Maddison, W.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997). doi: 10.1093/sysbio/46.3.523 CrossRefGoogle Scholar
  13. 13.
    Maestri, R., Monteiro, L.R., Fornel, R., Upham, N.S., Patterson, B.D., Freitas, T.R.O.: The ecology of a continental evolutionary radiation: is the radiation of sigmodontine rodents adaptive? Evolution 71(3), 610–632 (2017)CrossRefGoogle Scholar
  14. 14.
    Mallo, D., Martins, L.D.O., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). doi: 10.1093/sysbio/syv082 CrossRefGoogle Scholar
  15. 15.
    Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)CrossRefGoogle Scholar
  16. 16.
    Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)CrossRefGoogle Scholar
  17. 17.
    Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 7(1), 166–171 (2010)CrossRefGoogle Scholar
  18. 18.
    Nguyen, N., Mirarab, S., Warnow, T.: MRL and SuperFine+MRL: new supertree methods. Algorithms Mol. Biol. 7(1), 3 (2012)CrossRefGoogle Scholar
  19. 19.
    Roch, S.: A short proof that phylogenetic tree reconstruction by maximum likelihood is hard. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 3(1), 92 (2006)CrossRefGoogle Scholar
  20. 20.
    Ronquist, F., Teslenko, M., Van Der Mark, P., Ayres, D.L., Darling, A., Höhna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J.P.: MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61(3), 539–542 (2012)CrossRefGoogle Scholar
  21. 21.
    Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016)CrossRefGoogle Scholar
  22. 22.
    Sharanowski, B.J., Robbertse, B., Walker, J., Voss, S.R., Yoder, R., Spatafora, J., Sharkey, M.J.: Expressed sequence tags reveal Proctotrupomorpha (minus Chalcidoidea) as sister to Aculeata (Hymenoptera: Insecta). Mol. Phylogenet. Evol. 57(1), 101–112 (2010)CrossRefGoogle Scholar
  23. 23.
    Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9) (2014). doi: 10.1093/bioinformatics/btu033
  24. 24.
    Sukumaran, J., Holder, M.T.: Dendropy: a python library for phylogenetic computing. Bioinformatics 26(12), 1569–1571 (2010)CrossRefGoogle Scholar
  25. 25.
    Swenson, M.S., Barbançon, F., Warnow, T., Linder, C.R.: A simulation study comparing supertree and combined analysis methods using SMIDGen. Algorithms Mol. Biol. 5, 8 (2010)CrossRefGoogle Scholar
  26. 26.
    Szöllősi, G.J., Rosikiewicz, W., Boussau, B., Tannier, E., Daubin, V.: Efficient exploration of the space of reconciled gene trees. Syst. Biol. 62, 901–912 (2013)CrossRefGoogle Scholar
  27. 27.
    Than, C., Nakhleh, L.: Species tree inference by minimizing deep coalescences. PLoS Comput. Biol. 5(9), e1000501 (2009). doi: 10.1371/journal.pcbi.1000501.g016 MathSciNetCrossRefGoogle Scholar
  28. 28.
    Vachaspati, P.: Simulated data for siesta paper (2017). doi: 10.6084/m9.figshare.5234803.v1. Accessed 21 July 2017
  29. 29.
    Vachaspati, P., Warnow, T.: ASTRID: accurate species TRees from internode distances. BMC Genom. 16(10), 1–13 (2015). doi: 10.1186/1471-2164-16-S10-S3 Google Scholar
  30. 30.
    Vachaspati, P., Warnow, T.: FastRFS: fast and accurate Robinson-Foulds Supertrees using constrained exact optimization. Bioinformatics 33(5), 631–639 (2017)Google Scholar
  31. 31.
    Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Pranjal Vachaspati
    • 1
  • Tandy Warnow
    • 1
  1. 1.Department of Computer ScienceUniversity of IllinoisUrbanaUSA

Personalised recommendations