Skip to main content

Statistical Consistency of Coalescent-Based Species Tree Methods Under Models of Missing Data

  • Conference paper
  • First Online:
Comparative Genomics (RECOMB-CG 2017)

Part of the book series: Lecture Notes in Computer Science ((LNBI,volume 10562))

Included in the following conference series:

Abstract

The estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, duplication and loss, and horizontal gene transfer, that result in gene trees that differ from the species tree. Methods to estimate species trees in the presence of gene tree discord resulting from incomplete lineage sorting (ILS) have been developed and proved to be statistically consistent when gene tree discord is due only to ILS and every gene tree has the full set of species. Here we address statistical consistency of coalescent-based species tree estimation methods when gene trees are missing species, i.e., in the presence of missing data.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A., RoyChoudhury, A.: Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 29(8), 1917–1932 (2012)

    Article  Google Scholar 

  2. Chifman, J., Kubatko, L.: Quartet inference from SNP data under the coalescent. Bioinformatics 30(23), 3317–3324 (2014)

    Article  Google Scholar 

  3. Dasarathy, G., Nowak, R., Roch, S.: Data requirement for phylogenetic inference from multiple loci: a new distance method. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(2), 422–432 (2015)

    Article  Google Scholar 

  4. DeGiorgio, M., Degnan, J.H.: Fast and consistent estimation of species trees using supermatrix rooted triples. Mol. Biol. Evol. 27(3), 552–569 (2010)

    Article  Google Scholar 

  5. Edwards, S.V.: Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009)

    Article  Google Scholar 

  6. Graybeal, A.: Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47(1), 9–17 (1998)

    Article  Google Scholar 

  7. Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)

    Article  Google Scholar 

  8. Hovmöller, R., Knowles, L.L., Kubatko, L.S.: Effects of missing data on species tree estimation under the coalescent. Mol. Phylogenet. Evol. 69, 1057–1062 (2013)

    Article  Google Scholar 

  9. Jewett, E., Rosenberg, N.: iGLASS: an improvement to the GLASS method for estimating species trees from gene trees. J. Comput. Biol. 19(3), 293–315 (2012)

    Article  MathSciNet  Google Scholar 

  10. Kingman, J.F.C.: On the genealogy of large populations. J. Appl. Probab. 19, 27 (1982)

    Article  MathSciNet  MATH  Google Scholar 

  11. Kubatko, L.S., Carstens, B.C., Knowles, L.L.: STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7), 971–973 (2009)

    Article  Google Scholar 

  12. Larget, B.R., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)

    Article  Google Scholar 

  13. Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program: table 1. Mol. Biol. Evol. 32(10), 2798–2800 (2015)

    Article  Google Scholar 

  14. Lemmon, A.R., Brown, J.M., Stanger-Hall, K., Lemmon, E.M.: The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst. Biol. 58(1), 130–145 (2009)

    Article  Google Scholar 

  15. Liu, L.: BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics 24(21), 2542–2543 (2008)

    Article  Google Scholar 

  16. Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)

    Article  Google Scholar 

  17. Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 302 (2010)

    Article  Google Scholar 

  18. Liu, L., Yu, L., Pearl, D.K., Edwards, S.V.: Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58(5), 468–77 (2009)

    Article  Google Scholar 

  19. Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)

    Article  Google Scholar 

  20. Mirarab, S., Reaz, R., Bayzid, M., Zimmermann, T., Swenson, M., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)

    Article  Google Scholar 

  21. Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)

    Article  Google Scholar 

  22. Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 166–171 (2010)

    Article  Google Scholar 

  23. Page, R.D.M.: Modified mincut supertrees. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 537–551. Springer, Heidelberg (2002). doi:10.1007/3-540-45784-4_41

    Chapter  Google Scholar 

  24. Pollock, D.D., Zwickl, D.J., McGuire, J.A., Hillis, D.M.: Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, 664–671 (2002)

    Article  Google Scholar 

  25. Roch, S., Warnow, T.: On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods. Syst. Biol. 64(4), 663–676 (2015)

    Article  Google Scholar 

  26. Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)

    Google Scholar 

  27. Semple, C., Steel, M.: Phylogenetics. Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, Oxford (2003)

    MATH  Google Scholar 

  28. Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  29. Streicher, J.W., Schulte, J.A., Wiens, J.J.: How should genes and taxa be sampled for phylogenomic analyses with missing data? An empirical study in iguanian lizards. Syst. Biol. 65(1), 128–145 (2016)

    Article  Google Scholar 

  30. Swofford, D.: PAUP*: Phylogenetic analysis using parsimony (* and other methods) Ver. 4. Sinauer Associates, Sunderland, Massachusetts (2002)

    Google Scholar 

  31. Vachaspati, P., Warnow, T.: ASTRID: Accurate species trees from internode distances. BMC Genom. 16(Suppl. 10), S3 (2015)

    Article  Google Scholar 

  32. Wickett, N.J., Mirarab, S., Nguyen, N., Warnow, T., Carpenter, E., Matasci, N., Ayyampalayam, S., Barker, M.S., Burleigh, J.G., Gitzendanner, M.A., Ruhfel, B.R., Wafulal, E., Derl, J.P., Graham, S.W., Mathews, S., Melkonian, M., Soltis, D.E., Soltis, P.S., Miles, N.W., Rothfels, C.J., Pokorny, L., Shaw, A.J., De Gironimo, L., Stevenson, D.W., Sureko, B., Villarreal, J.C., Roure, B., Philippe, H., de Pamphilis, C.W., Chen, T., Deyholos, M.K., Baucom, R.S., Kutchan, T.M., Augustin, M.M., Wang, J., Zhang, Y., Tian, Z., Yan, Z., Wu, X., Sun, X., Wong, G.K.S., Leebens-Mack, J.: Phylotranscriptomic analysis of the origin and diversification of land plants. Proc. Nat. Acad. Sci. 111(45), E4859–E4868 (2014)

    Article  Google Scholar 

  33. Wiens, J.: Missing data, incomplete taxa, and phylogenetic accuracy. Syst. Biol. 52, 528–538 (2003)

    Article  Google Scholar 

  34. Wiens, J.: Missing data and the design of phylogenetic analyses. J. Biomed. Inform. 39, 34–42 (2006)

    Article  Google Scholar 

  35. Wiens, J.J., Morrill, M.C.: Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Syst. Biol. 60, 719–731 (2011)

    Article  Google Scholar 

  36. Xi, Z., Liu, L., Davis, C.C.: The impact of missing data on species tree estimation. Mol. Biol. Evol. 33(3), 838–860 (2016)

    Article  Google Scholar 

  37. Yang, J., Warnow, T.: Fast and accurate methods for phylogenomic analyses. BMC Bioinform. 12(Suppl. 9), S4 (2011)

    Article  Google Scholar 

  38. Zwickl, D.J., Hillis, D.M.: Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588–598 (2002)

    Article  Google Scholar 

Download references

Acknowledgements

MN was supported by NSF grants DBI-1461364, CCF-1535977 and AF:1513629 and by a fellowship from the CompGen initiative in the Coordinated Science Laboratory at UIUC. JC was supported by the Mathematics Department at UIUC.

A great deal of thanks is owed to our advisor, Dr. Tandy Warnow, who guided this manuscript from start to finish and pushed us to leave no stone unturned.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael Nute .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Nute, M., Chou, J. (2017). Statistical Consistency of Coalescent-Based Species Tree Methods Under Models of Missing Data. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_15

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-67979-2_15

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-67978-5

  • Online ISBN: 978-3-319-67979-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics