Skip to main content

Using INC Within Divide-and-Conquer Phylogeny Estimation

Part of the Lecture Notes in Computer Science book series (LNBI,volume 11488)

Abstract

In a recent paper (Zhang, Rao, and Warnow, Algorithms for Molecular Biology 2019), the INC (incremental tree building) algorithm was presented and proven to be absolute fast converging under standard sequence evolution models. A variant of INC which allows a set of disjoint constraint trees to be provided and then uses INC to merge the constraint trees was also presented (i.e., Constrained INC). We report on a study evaluating INC on a range of simulated datasets, and show that it has very poor accuracy in comparison to standard methods. We also explore the design space for divide-and-conquer strategies for phylogeny estimation that use Constrained INC, and show modifications that provide improved accuracy. In particular, we present INC-ML, a divide-and-conquer approach to maximum likelihood (ML) estimation that comes close to the leading ML heuristics in terms of accuracy, and is more accurate than the current best distance-based methods.

Keywords

  • Inferring the evolutionary phylogeny of species
  • Phylogeny estimation
  • Maximum likelihood
  • Sample complexity
  • Divide-and-conquer

Supported by the University of Illinois at Urbana-Champaign and NSF grants DGE-1144245, CCF-1535977, and CCF-1535989. Computational experiments were performed on Blue Waters, supported by NSF grants OCI-0725070 and ACI-1238993 and by the State of Illinois.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-18174-1_12
  • Chapter length: 12 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   54.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-18174-1
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   69.99
Price excludes VAT (USA)
Fig. 1.

References

  1. Bayzid, M.S., Hunt, T., Warnow, T.: Disk covering methods improve phylogenomic analyses. BMC Genomics 15(Suppl 6), S7 (2014)

    CrossRef  Google Scholar 

  2. Boc, A., Diallo, A., Makarenkov, V.: T-REX: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res. 40, W573–W579 (2012)

    CrossRef  Google Scholar 

  3. Buneman, P.: A note on the metric properties of trees. J. Comb. Theory (B) 17, 48–50 (1974)

    MathSciNet  CrossRef  Google Scholar 

  4. Erdös, P., Steel, M., Székely, L., Warnow, T.: Local quartet splits of a binary tree infer all quartet splits via one dyadic inference rule. Comput. Artif. Intell. 16(2), 217–227 (1997)

    MathSciNet  MATH  Google Scholar 

  5. Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (I). Random Struct. Algorithms 14, 153–184 (1999)

    MathSciNet  CrossRef  Google Scholar 

  6. Erdös, P., Steel, M., Székely, L., Warnow, T.: A few logs suffice to build (almost) all trees (II). Theor. Comput. Sci. 221, 77–118 (1999)

    MathSciNet  CrossRef  Google Scholar 

  7. Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)

    CrossRef  Google Scholar 

  8. Jukes, T.H., Cantor, C.R.: Evolution of protein molecules. In: Munro, H. (ed.) Mammalian Protein Metabolism, vol. 3, pp. 21–132. Academic Press, New York (1969)

    CrossRef  Google Scholar 

  9. Lacey, M.R., Chang, J.T.: A signal-to-noise analysis of phylogeny estimation by neighbor-joining: insufficiency of polynomial length sequences. Math. Biosci. 199(2), 188–215 (2006)

    MathSciNet  CrossRef  Google Scholar 

  10. Le, T.: GitHub site for the INC and constrained - INC software (2019). https://github.com/steven-le-thien/INC

  11. Le, T., Sy, A., Molloy, E., Zhang, Q., Rao, S., Warnow, T.: Using INC within divide-and-conquer phylogeny estimation - datasets (2019). https://databank.illinois.edu/datasets/IDB-8518809

  12. Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program. Mol. Biol. Evol. 32(10), 2798–2800 (2015). https://doi.org/10.1093/molbev/msv150

    CrossRef  Google Scholar 

  13. Liu, K., et al.: SATé-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees. Syst. Biol. 61(1), 90–106 (2012). https://doi.org/10.1093/sysbio/syr095

    CrossRef  Google Scholar 

  14. Maddison, W.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)

    CrossRef  Google Scholar 

  15. Mallo, D., De Oliveira Martins, L., Posada, D.: SimPhy: phylogenomic simulation of gene, locus, and species trees. Syst. Biol. 65(2), 334–344 (2016). https://doi.org/10.1093/sysbio/syv082

    CrossRef  Google Scholar 

  16. Mirarab, S., Nguyen, N., Wang, L.S., Guo, S., Kim, J., Warnow, T.: PASTA: ultra-large multiple sequence alignment of nucleotide and amino acid sequences. J. Comput. Biol. 22, 377–386 (2015)

    CrossRef  Google Scholar 

  17. Mirarab, S., Reaz, R., Bayzid, M.S., Zimmermann, T., Swenson, M.S., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)

    CrossRef  Google Scholar 

  18. Molloy, E.K., Warnow, T.: NJMerge: a generic technique for scaling phylogeny estimation methods and its application to species trees. In: Blanchette, M., Ouangraoua, A. (eds.) RECOMB-CG 2018. LNCS, vol. 11183, pp. 260–276. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-00834-5_15

    CrossRef  Google Scholar 

  19. Molloy, E.K., Warnow, T.: Statistically consistent divide-and-conquer pipelines for phylogeny estimation using NJMerge. bioRxiv (2018). https://doi.org/10.1101/469130

  20. Nelesen, S., Liu, K., Wang, L.S., Linder, C.R., Warnow, T.: DACTAL: divide-and-conquer trees (almost) without alignments. Bioinformatics 28(12), i274–i282 (2012)

    CrossRef  Google Scholar 

  21. Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree 2 - approximately maximum-likelihood trees for large alignments. PloS One 5(3), 1–10 (2010)

    CrossRef  Google Scholar 

  22. Roch, S., Sly, A.: Phase transition in the sample complexity of likelihood-based phylogeny inference. Probab. Theory Relat. Fields 169(1), 3–62 (2017)

    MathSciNet  CrossRef  Google Scholar 

  23. Sayyari, E., Whitfield, J.B., Mirarab, S.: Fragmentary gene sequences negatively impact gene tree and species tree reconstruction. Mol. Biol. Evol. 34(12), 3279–3291 (2017)

    CrossRef  Google Scholar 

  24. Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014)

    CrossRef  Google Scholar 

  25. Swofford, D.L.: PAUP* (*Phylogenetic Analysis Using PAUP), Version 4a161 (2018). http://phylosolutions.com/paup-test/

  26. Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. In: Lectures on Mathematics in the Life Sciences, vol. 17, pp. 57–86. American Mathematical Society (1986)

    Google Scholar 

  27. Warnow, T.: Divide-and-conquer tree estimation: opportunities and challenges. In: Warnow, T. (ed.) Bioinformatics and Phylogenetics. Springer (2019)

    Google Scholar 

  28. Warnow, T., Moret, B.M., St. John, K.: Absolute convergence: true trees from short sequences. In: Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, pp. 186–195. Society for Industrial and Applied Mathematics (2001)

    Google Scholar 

  29. Zhang, Q., Rao, S., Warnow, T.: Constrained incremental tree building: new absolute fast converging phylogeny estimation methods with improved scalability and accuracy. Algorithms Mol. Biol. 14(2), 2 (2019). https://rdcu.be/blBXm

    CrossRef  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tandy Warnow .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Le, T., Sy, A., Molloy, E.K., Zhang, Q.(., Rao, S., Warnow, T. (2019). Using INC Within Divide-and-Conquer Phylogeny Estimation. In: Holmes, I., Martín-Vide, C., Vega-Rodríguez, M. (eds) Algorithms for Computational Biology. AlCoB 2019. Lecture Notes in Computer Science(), vol 11488. Springer, Cham. https://doi.org/10.1007/978-3-030-18174-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18174-1_12

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18173-4

  • Online ISBN: 978-3-030-18174-1

  • eBook Packages: Computer ScienceComputer Science (R0)