Abstract
We show that many topological features of level-1 species networks are identifiable from the distribution of the gene tree quartets under the network multi-species coalescent model. In particular, every cycle of size at least 4 and every hybrid node in a cycle of size at least 5 are identifiable. This is a step toward justifying the inference of such networks which was recently implemented by Solís-Lemus and Ané. We show additionally how to compute quartet concordance factors for a network in terms of simpler networks, and explore some circumstances in which cycles of size 3 and hybrid nodes in 4-cycles can be detected.
Similar content being viewed by others
References
Allman ES, Degnan JH, Rhodes JA (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62(6):833–862
Ané C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees. Mol Biol Evolut 24(2):412–426
Arnold ML (1997) Natural hybridization and evolution, vol 53. Oxford University Press, Oxford
Bapteste E, van Iersel L, Janke A, Kelchner S, Kelk S, McInerney JO, Morrison DA, Nakhleh L, Steel M, Stougie L, Whitfield J (2013) Networks: expanding evolutionary thinking. Trends Genet 29(8):439–441
Carstens BC, Knowles LL, Tim C (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers. Syst Biol 56(3):400–411
Degnan JH (2010) Probabilities of gene trees with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell, pp 53–78. ISBN 0470526858
Ellstrand NC, Whitkus R, Rieseberg LH (1996) Distribution of spontaneous plant hybrids. Proc Nat Acad Sci U S A 93(10):5090–5093
Gusfield D, Bansal V, Bafna V, Song YS (2007) A decomposition theory for phylogenetic networks and incompatible characters. J Comput Biol 14(10):1247–1272
Huber KT, van Iersel L, Moulton V, Scornavacca C, Wu T (2017) Reconstructing phylogenetic level-1networks from nondense binet and trinet sets. Algorithmica 77(1):173–200
Huber KT, Moulton V, Semple C, Wu T (2017) Quarnet inference rules for level-1 networks. https://arxiv.org/pdf/1711.06720.pdf
Keijsper JCM, Pendavingh RA (2014) Reconstructing a phylogenetic Level-1 network from quartets. Bull Math Biol 76(10):2517–2541
Linder CR, Rieseberg LH (2004) Reconstructing patterns of reticulate evolution in plants. Am J Bot 91(10):1700–1708
Liu Liang Yu, Scott Lili Edwards, V. (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evolut Biol 10(1):302
Mallet J (2005) Hybridization as an invasion of the genome. Trends Ecol Evolut 20(5):229 – 237. Special issue: invasions, guest edited by Michael E. Hochberg and Nicholas J. Gotelli
Meng C, Kubatko LS (2009) Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theor Popul Biol 75(1):35–45
Nakhleh L (2010) Evolutionary phylogenetic networks: models and issues. In: Heath L, Ramakrishnan N (eds) Problem solving handbook in computational biology and bioinformatics. Springer, Boston, pp 125–158
Noor MA, Feder JL (2006) Speciation genetics: evolving approaches. Nat Rev Genet 7(11):851–861
Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evolut 5:568583
Pollard DA, Iyer VN, Moses AM, Eisen MB (2006) Widespread discordance of gene trees with species tree in drosophila: evidence for incomplete lineage sorting. PLoS Genet 2(10):1634–1647
Rieseberg LH, Baird SJ, Gardner KA (2000) Hybridization, introgression, and linkage evolution. Plant Mol Biol 42(1):205–224
Rosselló F, Valiente G (2009) All that glisters is not galled. Math Biosci 221(1):54–59
Semple C, Steel M (2005) Phylogenetics. Oxford University Press, Oxford
Solís-Lemus C, Ané C (2016) Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet 12(3):e1005896
Solís-Lemus C, Ané C, Yang M (2016) Inconsistency of species tree methods under gene flow. Syst Biol 65(5):843–851
Steel M (2016) Phylogeny discrete and random processes in evolution. SIAM, Philadelphia
Sullivant S, Talaska K, Draisma J (2010) Trek separation for gaussian graphical models. Ann Statist 38(3):1665–1685
Syring J, Willyard A, Cronn R, Liston A (2005) Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. Am J Bot 92(12):2086–2100
John Wakeley (2008) Coalescent theory: an introduction, vol 58. Roberts and Company Publishers, Englewood
Yu Y, Degnan JH, Nakhleh L (2014) Maximum likelihood inference of reticulate evolutionary histories. PNAS 111(296–305):11
Yu Y, Degnan JH, Nakhleh L (2012) The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet 8:e1002660
Yu Y, Than C, Degnan JH, Nakhleh L (2011) Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 60(2):138–149
Zhang C, Ogilvie HW, Drummond AJ, Stadler T (2018) Bayesian inference of species networks from multilocus sequence data. Mol Biol Evolut 35(504–517):02
Zhu J, Yu Y, Nakhleh L (2016) In the light of deep coalescence: revisiting trees within networks. BMC Bioinform 17:415
Zhu S, Degnan J (2017) Displayed trees do not determine distinguishability under the network multispecies coalescent. Syst Biol 66:283298
Acknowledgements
The author deeply thanks John A. Rhodes and Elizabeth S. Allman for their technical assistance and suggestions during the development of this work, and the reviewers for their valuable suggestions and observations.
Author information
Authors and Affiliations
Corresponding author
Additional information
This research was supported in part by the National Institutes of Health Grant R01 GM117590, awarded under the Joint DMS/NIGMS Initiative to Support Research at the Interface of the Biological and Mathematical Sciences.
Appendix
Appendix
Here, Proposition 1 of Section 2 is proved. The argument uses the following.
Lemma 17
Let \(\mathcal {N}^+\) be a (metric or topological) rooted network on X and let \(Z\subset X\). For any edge e below LSA(Z), with a descendant in Z, there are \(x,y\in Z\) such that e is in a simple trek in \(\mathcal {N}^+\) from x to y whose edges are below LSA(Z).
Proof
Let \(x\in Z\) be below e. By Lemma 2 there exists \(y\in Z\) with LSA(x, y) above e.
Suppose y is not below e. Let \(P_x\) be a path from LSA(x, y) to x containing e and let \(P_y\) be a path from LSA(x, y) to y. Let u be the minimal node in the intersection of \(P_x\) and \(P_y\). Since y is not below e, u cannot be below e. Then the subpath of \(P_x\) from u to x, which contains e, and the subpath of \(P_y\) from f to y form a simple trek containing e.
Now assume y is below e. Since e is below LSA(x, y), there exists a path from LSA(x, y) to one of y or x that does not pass through the child of e. Without loss of generality suppose such a path \(P_y\) goes from LSA(x, y) to y. Let \(P_x\) be a path from LSA(x, y) to x that passes through e. Let \(A=A(P_x,P_y)\) be the set of nodes above e, common to \(P_y\) and \(P_x\). Let \(a\in A\) be the minimal node in A.
Let \(B(P_y,P_x)\) be the set of nodes below e, common to \(P_y\) and \(P_x\). We may assume that we choose \(P_x\) and \(P_y\) such that \(B=B(P_y,P_x)\) has minimal cardinality. If \(B=\emptyset \) then the desired trek is easily constructed, with top a. So suppose \(B\ne \emptyset \) has minimal element \(b^-\) and maximal element \(b^+\). We are going to contradict the minimality of B. Note that \(b^+\) must be the hybrid node of a cycle containing e (see Fig. 25 for a graphical reference).
Since \(b^-\) is not LSA(x, y), there exists a path \(P^*\) from LSA(x, y) to one of x or y that does not pass through \(b^-\). Note that \(P^*\) has to intersect at least one of \(P_y\) or \(P_x\) at an internal node below \(b^-\). Let \(C_1\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\) and let \(C_2\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\). Let c be the maximal node in \(C_1\cup C_2\). We can assume, without loss of generality, that c is in \(P_y\). This is because if instead, c were in \(P_x\), we can construct paths \(P_x'\) and \(P_y'\) where \(P_i'\) contains all the edges in \(P_i\) above \(b^-\) and all edges of \(P_j\) below \(b^-\) for \(i,j\in \{x,y\}\), \(i\ne j\). Note that \(P_x'\) passes through e and does not contains c, while \(P_y'\) does not pass through e, contains c, and \(B=B(P_y',P_x')\).
Denote by W the set of nodes in \((P^*\cap P_y)\cup (P^*\cap P_x)\) and let w be the minimal node of W above \(b^-\). Since \(\mathcal {N}^+\) is binary, w cannot be a or \(b^+\) (see Fig. 25 for a graphical reference). There are 5 different cases of the location of w in the network composed by the paths \(P_y\) and \(P_x\). These are
-
1.
w is in \(P_y\), above \(b^+\) but below a.
-
2.
w is in \(P_x\), above \(b^+\) but below e.
-
3.
w is in \(P_x\), above e but below a.
-
4.
w is in one or more of \(P_x\) or \(P_y\), above a.
-
5.
w is in one or more of \(P_x\) or \(P_y\), above \(b^-\) but below \(b^+\).
Figure 25 depicts in gray the graph composed by the paths \(P_y\) and \(P_x\), and in black we see the possible subpaths of \(P^*\) from w to c. In any of case 1, 2 or 3 we can find a simple trek containing e as depicted in Fig. 26 by choosing the appropriate edges, and thus, B was not minimal. For case 4 and 5 there are two possibilities; (i) w is in both \(P_y\) and \(P_x\); (ii) w is only in one of \(P_y\) or \(P_x\). For case 4 (i), the situation is simple, and we can find a simple trek as depicted on the left in Fig. 27. For case 4 (ii), we first find the node in A that is right above w. Then as depicted on the left of Fig. 27 we can find a simple trek.
For case 5 we do not find a simple trek directly, instead we construct two paths \(P_1\) and \(P_2\) from LSA(x, y) to x, y, respectively, only one of which contains e with at least one less node in \(B(P_1,P_2)\) than B. For case 5 (i), we just take \(P_1\) to be the same as \(P_x\) and for \(P_2\) we consider the same edges that are in \(P_y\) above w, the edges below c, and the edges in \(P^*\) between w and c. For case 5 (ii), we assume without loss of generality that w is in \(P_x\). Let b be the node in B right above w. Let \(P_1\) be the path containing the edges in \(P_x\) that are above b, the edges in \(P_y\) that are below b but above the node \(b'\in B\) right below w, and at last the edges in \(P_x\) below \(b'\). Let \(P_2\) the path containing the edges in \(P_y\) that are above b, the edges in \(P_x\) that are above a but below b, the edges in \(P^*\) that are above c but below w and at last the edges in \(P_y\) that are below c. Figure 27 (right) depicts \(P_1\) (red) and \(P_2\) (blue) for (i) and (ii). Since \(B(P_1,P_2)\) has at least one less node that B and we assumed B, the minimality of B is contradicted. \(\square \)
Proof (of Proposition 1)
Let \(M^+=\mathcal {N}^\oplus _Z\). Let \(M^-\) be the graph obtained from \(M^+\) by ignoring the direction of all tree edges and then suppressing the LSA(\(Z,\mathcal {N}^+\)), that is, the induced unrooted network from \(M^+\). Denote by \(M'\) the graph obtained by ignoring all directions of the tree edges in \(M^+\), so that by suppressing degree two nodes of either \(M^-\) or \(M'\) gives \((\mathcal {N}^+_Z)^-\). Let K be the graph obtained by considering all the edges in simple treks in \(\mathcal {N}^-\) from x to y for all \(x,y\in Z\), so that suppressing degree two nodes in K gives \((\mathcal {N}^-)_Z\). Showing either \(M'=K\) or \(M^-=K\), will prove the claim.
First we show that if LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)) then \(M'=K\), by arguing that \(M'\) and K have the same edges. Let e be an edge of \(M'\). Since LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)), \(M'\) is a subgraph of \(\mathcal {N}^-\) and e is directed in \(M^+\). By Lemma 17, e is in a simple trek in \(M^+\) from x to y, for some \(x,y\in Z\). This trek induces a simple trek in \(M'\) from x to y, and therefore a simple trek in \(\mathcal {N}^-\) from x to y. Thus, e is in K.
Now let e be an edge of K. Then there exists a simple trek \((\overline{P_1},\overline{P_2})\) in \(\mathcal {N}^-\) from x to y, for some \(x,y\in Z\) containing e. Let \(v=\)top\((\overline{P_1},\overline{P_2})\) and let T be the sequence of incident edges in \(\mathcal {N}^+\) from x to v conformed of edges inducing those in \(\overline{P_1}\) and \(\overline{P_2}\). Since \((\overline{P_1},\overline{P_2})\) is simple, T does not have repeated edges. Following T in \(\mathcal {N}^+\) from x to y, edges are first transversed “uphill” (in reverse direction) until there is a first “downhill” edge (u, w). The next edge in T cannot be uphill, as otherwise it would be hybrid and \((\overline{P_1},\overline{P_2})\) would have not been a trek in \(\mathcal {N}^-\). This argument applies for all consecutive edges in T until we end at y. Thus, there is a simple trek \((\overline{P_1},\overline{P_2})\) from x to y in \(\mathcal {N}^+\) with top u. Note that u must be below or equal to LSA(\(Z,\mathcal {N}^+\)) since otherwise the trek would not be simple. Moreover, \(P_1\) and \(P_2\) contain only edges in \(M^+\) and thus in \(M'\) after the directions of the tree edges is omitted. Thus, e is in \(M'\), so \(K=M'.\)
If LSA(\(Z,\mathcal {N}^+\))\(=\)LSA(\(X,\mathcal {N}^+\)) then \(M^-=K\) follows from a straight forward modification of the previous argument to account for the suppression of LSA\((z,\mathcal {N}^+)\) in both \(M^-\) and K. \(\square \)
Rights and permissions
About this article
Cite this article
Baños, H. Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model. Bull Math Biol 81, 494–534 (2019). https://doi.org/10.1007/s11538-018-0485-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-018-0485-4