ASTRAL-III: Increased Scalability and Impacts of Contracting Low Support Branches

  • Chao Zhang
  • Erfan Sayyari
  • Siavash MirarabEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10562)


Discordances between species trees and gene trees can complicate phylogenetics reconstruction. ASTRAL is a leading method for inferring species trees given gene trees while accounting for incomplete lineage sorting. It finds the tree that shares the maximum number of quartets with input trees, drawing bipartitions from a predefined set of bipartitions X. In this paper, we introduce ASTRAL-III, which substantially improves on ASTRAL-II in terms of running time by handling polytomies more efficiently, exploiting similarities between gene trees, and trimming unnecessary parts of the search space. The asymptotic running time in the presence of polytomies is reduced from \(O(n^3k|X|^{{1.726}})\) for n species and k genes to \(O(D|X|^{1.726})\) where \(D=O(nk)\) is the sum of degrees of all unique nodes in input trees. ASTRAL-III enables us to test whether contracting low support branches in gene trees improves the accuracy by reducing noise. In extensive simulations and on real data, we show that removing branches with very low support improves accuracy while overly aggressive filtering is harmful.


Phylogenomics Incomplete lineage sorting. ASTRAL 



This work was supported by the NSF grant IIS-1565862 to SM and ES. Computations were performed on the San Diego Supercomputer Center (SDSC) through XSEDE allocations, which is supported by the NSF grant ACI-1053575.

Supplementary material


  1. 1.
    Song, S., Liu, L., Edwards, S.V., Wu, S.: Resolving conflict in eutherian mammal phylogeny using phylogenomics and the multispecies coalescent model. Proc. Nat. Acad. Sci. 109(37), 14942–14947 (2012)CrossRefGoogle Scholar
  2. 2.
    Wickett, N.J., Mirarab, S., Nguyen, N., et al.: Phylotranscriptomic analysis of the origin and early diversification of land plants. Proc. Nat. Acad. Sci. 111(45), 4859–4868 (2014)CrossRefGoogle Scholar
  3. 3.
    Jarvis, E.D., Mirarab, S., Aberer, A.J., et al.: Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346(6215), 1320–1331 (2014)CrossRefGoogle Scholar
  4. 4.
    Laumer, C.E., Hejnol, A., Giribet, G.: Nuclear genomic signals of the ‘microturbellarian’ roots of platyhelminth evolutionary innovation. eLife 4 (2015)Google Scholar
  5. 5.
    Tarver, J.E., dos Reis, M., Mirarab, S., et al.: The interrelationships of placental mammals and the limits of phylogenetic inference. Genome Biol. Evol. 8(2), 330–344 (2016)CrossRefGoogle Scholar
  6. 6.
    Rokas, A., Williams, B.L., King, N., Carroll, S.B.: Genome-scale approaches to resolving incongruence in molecular phylogenies. Nature 425(6960), 798–804 (2003)CrossRefGoogle Scholar
  7. 7.
    Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)CrossRefGoogle Scholar
  8. 8.
    Springer, M.S., Gatesy, J.: The gene tree delusion. Mol. Phylogenet. Evol. 94(Part A), 1–33 (2016)CrossRefGoogle Scholar
  9. 9.
    Meiklejohn, K.A., Faircloth, B.C., Glenn, T.C., Kimball, R.T., Braun, E.L.: Analysis of a rapid evolutionary radiation using ultraconserved elements: evidence for a bias in some multispecies coalescent methods. Syst. Biol. 65(4), 612–627 (2016)CrossRefGoogle Scholar
  10. 10.
    Edwards, S.V., Xi, Z., Janke, A., et al.: Implementing and testing the multispecies coalescent model: a valuable paradigm for phylogenomics. Mol. Phylogenet. Evol. 94, 447–462 (2016)CrossRefGoogle Scholar
  11. 11.
    Shen, X.X., Hittinger, C.T., Rokas, A.: Studies can be driven by a handful of genes. Nature 1, 1–10 (2017)Google Scholar
  12. 12.
    Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)CrossRefGoogle Scholar
  13. 13.
    Chifman, J., Kubatko, L.S.: Quartet inference from SNP data under the coalescent model. Bioinformatics 30(23), 3317–3324 (2014)CrossRefGoogle Scholar
  14. 14.
    Degnan, J.H., Rosenberg, N.A.: Gene tree discordance, phylogenetic inference and the multispecies coalescent. Trends Ecol. Evol. 24(6), 332–340 (2009)CrossRefGoogle Scholar
  15. 15.
    Edwards, S.V.: Is a new and general theory of molecular systematics emerging? Evolution 63(1), 1–19 (2009)CrossRefGoogle Scholar
  16. 16.
    Pamilo, P., Nei, M.: Relationships between gene trees and species trees. Mol. Biol. Evol. 5(5), 568–583 (1988)Google Scholar
  17. 17.
    Rannala, B., Yang, Z.: Bayes estimation of species divergence times and ancestral population sizes using DNA sequences from multiple loci. Genetics 164(4), 1645–1656 (2003)Google Scholar
  18. 18.
    Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 302 (2010)CrossRefGoogle Scholar
  19. 19.
    Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60, 661–667 (2011)CrossRefGoogle Scholar
  20. 20.
    Sayyari, E., Mirarab, S.: Anchoring quartet-based phylogenetic distances and applications to species tree reconstruction. BMC Genomics 17(S10), 101–113 (2016)CrossRefGoogle Scholar
  21. 21.
    Liu, L., Yu, L., Pearl, D.K., Edwards, S.V.: Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58(5), 468–477 (2009)CrossRefGoogle Scholar
  22. 22.
    Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinform. (TCBB) 7(1), 166–171 (2010)CrossRefGoogle Scholar
  23. 23.
    Roch, S., Warnow, T.: On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods. Syst. Biol. 64(4), 663–676 (2015)CrossRefGoogle Scholar
  24. 24.
    Mirarab, S., Reaz, R., Bayzid, M.S., et al.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)CrossRefGoogle Scholar
  25. 25.
    Lafond, M., Scornavacca, C.: On the Weighted Quartet Consensus problem. arXiv:1610.00505 (2016)
  26. 26.
    Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)CrossRefGoogle Scholar
  27. 27.
    Allman, E.S., Degnan, J.H., Rhodes, J.A.: Determining species tree topologies from clade probabilities under the coalescent. J. Theor. Biol. 289(1), 96–106 (2011)MathSciNetCrossRefGoogle Scholar
  28. 28.
    Shekhar, S., Roch, S., Mirarab, S.: Species tree estimation using ASTRAL: how many genes are enough? In: Proceedings of International Conference on Research in Computational Molecular Biology (RECOMB) (to appear) (2017)Google Scholar
  29. 29.
    Davidson, R., Vachaspati, P., Mirarab, S., Warnow, T.: Phylogenomic species tree estimation in the presence of incomplete lineage sorting and horizontal gene transfer. BMC Genomics 16(Suppl 10), S1 (2015)CrossRefGoogle Scholar
  30. 30.
    Sayyari, E., Mirarab, S.: Fast coalescent-based computation of local branch support from quartet frequencies. Mol. Biol. Evol. 33(7), 1654–1668 (2016)CrossRefGoogle Scholar
  31. 31.
    Price, M.N., Dehal, P.S., Arkin, A.P.: FastTree-2 - approximately maximum-likelihood trees for large alignments. PLoS ONE 5(3), e9490 (2010)CrossRefGoogle Scholar
  32. 32.
    Mirarab, S., Bayzid, M.S., Boussau, B., Warnow, T.: Statistical binning enables an accurate coalescent-based estimation of the avian tree. Science 346(6215), 1250463–1250463 (2014)CrossRefGoogle Scholar
  33. 33.
    Bayzid, M.S., Mirarab, S., Boussau, B., Warnow, T.: Weighted statistical binning: enabling statistically consistent genome-scale phylogenetic analyses. PLoS ONE 10(6), e0129183 (2015)CrossRefGoogle Scholar
  34. 34.
    Mirarab, S., Bayzid, M.S., Warnow, T.: Evaluating summary methods for multilocus species tree estimation in the presence of incomplete lineage sorting. Syst. Biol. 65(3), 366–380 (2016)CrossRefGoogle Scholar
  35. 35.
    Patel, S., Kimball, R., Braun, E.: Error in phylogenetic estimation for bushes in the tree of life. Phylogenet. Evol. Biol. 1(2), 2 (2013)Google Scholar
  36. 36.
    Gatesy, J., Springer, M.S.: Phylogenetic analysis at deep timescales: unreliable gene trees, bypassed hidden support, and the coalescence/concatalescence conundrum. Mol. Phylogenet. Evol. 80, 231–266 (2014)CrossRefGoogle Scholar
  37. 37.
    Yu, Y., Warnow, T., Nakhleh, L.: Algorithms for MDC-based multi-locus phylogeny inference: beyond rooted binary gene trees on single alleles. J. Comput. Biol. 18(11), 1543–1559 (2011)MathSciNetCrossRefGoogle Scholar
  38. 38.
    Vachaspati, P., Warnow, T.: ASTRID: accurate species trees from internode distances. BMC genomics 16(Suppl 10), S3 (2015)CrossRefGoogle Scholar
  39. 39.
    Kane, D., Tao, T.: A bound on partitioning clusters (2017). arXiv:11702.00912
  40. 40.
    Stamatakis, A.: RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30(9), 1312–1313 (2014)CrossRefGoogle Scholar
  41. 41.
    Mallo, D., De Oliveira Martins, L., Posada, D.: SimPhy: Phylogenomic simulation of gene, locus and species trees. Syst. Biol. 65(2), syv082 (2016)CrossRefGoogle Scholar
  42. 42.
    Fletcher, W., Yang, Z.: INDELible: a flexible simulator of biological sequence evolution. Mol. Biol. Evol. 26(8), 1879–1888 (2009)CrossRefGoogle Scholar
  43. 43.
    Tavaré, S.: Some probabilistic and statistical problems in the analysis of DNA sequences. Lect. Math. Life Sci. 17, 57–86 (1986)MathSciNetzbMATHGoogle Scholar
  44. 44.
    Junier, T., Zdobnov, E.M.: The Newick utilities: high-throughput phylogenetic tree processing in the UNIX shell. Bioinformatics 26(13), 1669–1670 (2010)CrossRefGoogle Scholar
  45. 45.
    Robinson, D., Foulds, L.: Comparison of phylogenetic trees. Math. Biosci. 53(1–2), 131–147 (1981)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of California at San DiegoSan DiegoUSA

Personalised recommendations