Abstract
The estimation of species trees from multiple genes is complicated by processes such as incomplete lineage sorting, duplication and loss, and horizontal gene transfer, that result in gene trees that differ from the species tree. Methods to estimate species trees in the presence of gene tree discord resulting from incomplete lineage sorting (ILS) have been developed and proved to be statistically consistent when gene tree discord is due only to ILS and every gene tree has the full set of species. Here we address statistical consistency of coalescent-based species tree estimation methods when gene trees are missing species, i.e., in the presence of missing data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Bryant, D., Bouckaert, R., Felsenstein, J., Rosenberg, N.A., RoyChoudhury, A.: Inferring species trees directly from biallelic genetic markers: bypassing gene trees in a full coalescent analysis. Mol. Biol. Evol. 29(8), 1917–1932 (2012)
Chifman, J., Kubatko, L.: Quartet inference from SNP data under the coalescent. Bioinformatics 30(23), 3317–3324 (2014)
Dasarathy, G., Nowak, R., Roch, S.: Data requirement for phylogenetic inference from multiple loci: a new distance method. IEEE/ACM Trans. Comput. Biol. Bioinf. 12(2), 422–432 (2015)
DeGiorgio, M., Degnan, J.H.: Fast and consistent estimation of species trees using supermatrix rooted triples. Mol. Biol. Evol. 27(3), 552–569 (2010)
Edwards, S.V.: Is a new and general theory of molecular systematics emerging? Evolution 63, 1–19 (2009)
Graybeal, A.: Is it better to add taxa or characters to a difficult phylogenetic problem? Syst. Biol. 47(1), 9–17 (1998)
Heled, J., Drummond, A.J.: Bayesian inference of species trees from multilocus data. Mol. Biol. Evol. 27(3), 570–580 (2010)
Hovmöller, R., Knowles, L.L., Kubatko, L.S.: Effects of missing data on species tree estimation under the coalescent. Mol. Phylogenet. Evol. 69, 1057–1062 (2013)
Jewett, E., Rosenberg, N.: iGLASS: an improvement to the GLASS method for estimating species trees from gene trees. J. Comput. Biol. 19(3), 293–315 (2012)
Kingman, J.F.C.: On the genealogy of large populations. J. Appl. Probab. 19, 27 (1982)
Kubatko, L.S., Carstens, B.C., Knowles, L.L.: STEM: species tree estimation using maximum likelihood for gene trees under coalescence. Bioinformatics 25(7), 971–973 (2009)
Larget, B.R., Kotha, S.K., Dewey, C.N., Ané, C.: BUCKy: gene tree/species tree reconciliation with Bayesian concordance analysis. Bioinformatics 26(22), 2910–2911 (2010)
Lefort, V., Desper, R., Gascuel, O.: FastME 2.0: a comprehensive, accurate, and fast distance-based phylogeny inference program: table 1. Mol. Biol. Evol. 32(10), 2798–2800 (2015)
Lemmon, A.R., Brown, J.M., Stanger-Hall, K., Lemmon, E.M.: The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst. Biol. 58(1), 130–145 (2009)
Liu, L.: BEST: Bayesian estimation of species trees under the coalescent model. Bioinformatics 24(21), 2542–2543 (2008)
Liu, L., Yu, L.: Estimating species trees from unrooted gene trees. Syst. Biol. 60(5), 661–667 (2011)
Liu, L., Yu, L., Edwards, S.V.: A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evol. Biol. 10(1), 302 (2010)
Liu, L., Yu, L., Pearl, D.K., Edwards, S.V.: Estimating species phylogenies using coalescence times among sequences. Syst. Biol. 58(5), 468–77 (2009)
Maddison, W.P.: Gene trees in species trees. Syst. Biol. 46(3), 523–536 (1997)
Mirarab, S., Reaz, R., Bayzid, M., Zimmermann, T., Swenson, M., Warnow, T.: ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30(17), i541–i548 (2014)
Mirarab, S., Warnow, T.: ASTRAL-II: coalescent-based species tree estimation with many hundreds of taxa and thousands of genes. Bioinformatics 31(12), i44–i52 (2015)
Mossel, E., Roch, S.: Incomplete lineage sorting: consistent phylogeny estimation from multiple loci. IEEE/ACM Trans. Comput. Biol. Bioinf. 7(1), 166–171 (2010)
Page, R.D.M.: Modified mincut supertrees. In: Guigó, R., Gusfield, D. (eds.) WABI 2002. LNCS, vol. 2452, pp. 537–551. Springer, Heidelberg (2002). doi:10.1007/3-540-45784-4_41
Pollock, D.D., Zwickl, D.J., McGuire, J.A., Hillis, D.M.: Increased taxon sampling is advantageous for phylogenetic inference. Syst. Biol. 51, 664–671 (2002)
Roch, S., Warnow, T.: On the robustness to gene tree estimation error (or lack thereof) of coalescent-based species tree methods. Syst. Biol. 64(4), 663–676 (2015)
Saitou, N., Nei, M.: The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425 (1987)
Semple, C., Steel, M.: Phylogenetics. Oxford Lecture Series in Mathematics and its Applications. Oxford University Press, Oxford (2003)
Steel, M.: The complexity of reconstructing trees from qualitative characters and subtrees. J. Classif. 9, 91–116 (1992)
Streicher, J.W., Schulte, J.A., Wiens, J.J.: How should genes and taxa be sampled for phylogenomic analyses with missing data? An empirical study in iguanian lizards. Syst. Biol. 65(1), 128–145 (2016)
Swofford, D.: PAUP*: Phylogenetic analysis using parsimony (* and other methods) Ver. 4. Sinauer Associates, Sunderland, Massachusetts (2002)
Vachaspati, P., Warnow, T.: ASTRID: Accurate species trees from internode distances. BMC Genom. 16(Suppl. 10), S3 (2015)
Wickett, N.J., Mirarab, S., Nguyen, N., Warnow, T., Carpenter, E., Matasci, N., Ayyampalayam, S., Barker, M.S., Burleigh, J.G., Gitzendanner, M.A., Ruhfel, B.R., Wafulal, E., Derl, J.P., Graham, S.W., Mathews, S., Melkonian, M., Soltis, D.E., Soltis, P.S., Miles, N.W., Rothfels, C.J., Pokorny, L., Shaw, A.J., De Gironimo, L., Stevenson, D.W., Sureko, B., Villarreal, J.C., Roure, B., Philippe, H., de Pamphilis, C.W., Chen, T., Deyholos, M.K., Baucom, R.S., Kutchan, T.M., Augustin, M.M., Wang, J., Zhang, Y., Tian, Z., Yan, Z., Wu, X., Sun, X., Wong, G.K.S., Leebens-Mack, J.: Phylotranscriptomic analysis of the origin and diversification of land plants. Proc. Nat. Acad. Sci. 111(45), E4859–E4868 (2014)
Wiens, J.: Missing data, incomplete taxa, and phylogenetic accuracy. Syst. Biol. 52, 528–538 (2003)
Wiens, J.: Missing data and the design of phylogenetic analyses. J. Biomed. Inform. 39, 34–42 (2006)
Wiens, J.J., Morrill, M.C.: Missing data in phylogenetic analysis: reconciling results from simulations and empirical data. Syst. Biol. 60, 719–731 (2011)
Xi, Z., Liu, L., Davis, C.C.: The impact of missing data on species tree estimation. Mol. Biol. Evol. 33(3), 838–860 (2016)
Yang, J., Warnow, T.: Fast and accurate methods for phylogenomic analyses. BMC Bioinform. 12(Suppl. 9), S4 (2011)
Zwickl, D.J., Hillis, D.M.: Increased taxon sampling greatly reduces phylogenetic error. Syst. Biol. 51, 588–598 (2002)
Acknowledgements
MN was supported by NSF grants DBI-1461364, CCF-1535977 and AF:1513629 and by a fellowship from the CompGen initiative in the Coordinated Science Laboratory at UIUC. JC was supported by the Mathematics Department at UIUC.
A great deal of thanks is owed to our advisor, Dr. Tandy Warnow, who guided this manuscript from start to finish and pushed us to leave no stone unturned.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Nute, M., Chou, J. (2017). Statistical Consistency of Coalescent-Based Species Tree Methods Under Models of Missing Data. In: Meidanis, J., Nakhleh, L. (eds) Comparative Genomics. RECOMB-CG 2017. Lecture Notes in Computer Science(), vol 10562. Springer, Cham. https://doi.org/10.1007/978-3-319-67979-2_15
Download citation
DOI: https://doi.org/10.1007/978-3-319-67979-2_15
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67978-5
Online ISBN: 978-3-319-67979-2
eBook Packages: Computer ScienceComputer Science (R0)