Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model

Baños, Hector

doi:10.1007/s11538-018-0485-4

Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model

Special Issue: Algebraic Methods in Phylogenetics
Published: 09 August 2018

Volume 81, pages 494–534, (2019)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Hector Baños¹

328 Accesses
13 Citations
2 Altmetric
Explore all metrics

Abstract

We show that many topological features of level-1 species networks are identifiable from the distribution of the gene tree quartets under the network multi-species coalescent model. In particular, every cycle of size at least 4 and every hybrid node in a cycle of size at least 5 are identifiable. This is a step toward justifying the inference of such networks which was recently implemented by Solís-Lemus and Ané. We show additionally how to compute quartet concordance factors for a network in terms of simpler networks, and explore some circumstances in which cycles of size 3 and hybrid nodes in 4-cycles can be detected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 5

Fig. 8

Fig. 9

Fig. 21

The tree of blobs of a species network: identifiability under the coalescent

Article 06 December 2022

In the light of deep coalescence: revisiting trees within networks

Article Open access 11 November 2016

NANUQ: a method for inferring species networks from gene trees under the coalescent model

Article Open access 06 December 2019

References

Allman ES, Degnan JH, Rhodes JA (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62(6):833–862
Article MathSciNet MATH Google Scholar
Ané C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees. Mol Biol Evolut 24(2):412–426
Article Google Scholar
Arnold ML (1997) Natural hybridization and evolution, vol 53. Oxford University Press, Oxford
Google Scholar
Bapteste E, van Iersel L, Janke A, Kelchner S, Kelk S, McInerney JO, Morrison DA, Nakhleh L, Steel M, Stougie L, Whitfield J (2013) Networks: expanding evolutionary thinking. Trends Genet 29(8):439–441
Article Google Scholar
Carstens BC, Knowles LL, Tim C (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers. Syst Biol 56(3):400–411
Article Google Scholar
Degnan JH (2010) Probabilities of gene trees with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell, pp 53–78. ISBN 0470526858
Ellstrand NC, Whitkus R, Rieseberg LH (1996) Distribution of spontaneous plant hybrids. Proc Nat Acad Sci U S A 93(10):5090–5093
Article Google Scholar
Gusfield D, Bansal V, Bafna V, Song YS (2007) A decomposition theory for phylogenetic networks and incompatible characters. J Comput Biol 14(10):1247–1272
Article MathSciNet Google Scholar
Huber KT, van Iersel L, Moulton V, Scornavacca C, Wu T (2017) Reconstructing phylogenetic level-1networks from nondense binet and trinet sets. Algorithmica 77(1):173–200
Article MathSciNet MATH Google Scholar
Huber KT, Moulton V, Semple C, Wu T (2017) Quarnet inference rules for level-1 networks. https://arxiv.org/pdf/1711.06720.pdf
Keijsper JCM, Pendavingh RA (2014) Reconstructing a phylogenetic Level-1 network from quartets. Bull Math Biol 76(10):2517–2541
Article MathSciNet MATH Google Scholar
Linder CR, Rieseberg LH (2004) Reconstructing patterns of reticulate evolution in plants. Am J Bot 91(10):1700–1708
Article Google Scholar
Liu Liang Yu, Scott Lili Edwards, V. (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evolut Biol 10(1):302
Mallet J (2005) Hybridization as an invasion of the genome. Trends Ecol Evolut 20(5):229 – 237. Special issue: invasions, guest edited by Michael E. Hochberg and Nicholas J. Gotelli
Meng C, Kubatko LS (2009) Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theor Popul Biol 75(1):35–45
Article MATH Google Scholar
Nakhleh L (2010) Evolutionary phylogenetic networks: models and issues. In: Heath L, Ramakrishnan N (eds) Problem solving handbook in computational biology and bioinformatics. Springer, Boston, pp 125–158
Chapter Google Scholar
Noor MA, Feder JL (2006) Speciation genetics: evolving approaches. Nat Rev Genet 7(11):851–861
Article Google Scholar
Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evolut 5:568583
Google Scholar
Pollard DA, Iyer VN, Moses AM, Eisen MB (2006) Widespread discordance of gene trees with species tree in drosophila: evidence for incomplete lineage sorting. PLoS Genet 2(10):1634–1647
Google Scholar
Rieseberg LH, Baird SJ, Gardner KA (2000) Hybridization, introgression, and linkage evolution. Plant Mol Biol 42(1):205–224
Article Google Scholar
Rosselló F, Valiente G (2009) All that glisters is not galled. Math Biosci 221(1):54–59
Article MathSciNet MATH Google Scholar
Semple C, Steel M (2005) Phylogenetics. Oxford University Press, Oxford
MATH Google Scholar
Solís-Lemus C, Ané C (2016) Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet 12(3):e1005896
Article Google Scholar
Solís-Lemus C, Ané C, Yang M (2016) Inconsistency of species tree methods under gene flow. Syst Biol 65(5):843–851
Article Google Scholar
Steel M (2016) Phylogeny discrete and random processes in evolution. SIAM, Philadelphia
Book MATH Google Scholar
Sullivant S, Talaska K, Draisma J (2010) Trek separation for gaussian graphical models. Ann Statist 38(3):1665–1685
Article MathSciNet MATH Google Scholar
Syring J, Willyard A, Cronn R, Liston A (2005) Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. Am J Bot 92(12):2086–2100
Article Google Scholar
John Wakeley (2008) Coalescent theory: an introduction, vol 58. Roberts and Company Publishers, Englewood
MATH Google Scholar
Yu Y, Degnan JH, Nakhleh L (2014) Maximum likelihood inference of reticulate evolutionary histories. PNAS 111(296–305):11
Google Scholar
Yu Y, Degnan JH, Nakhleh L (2012) The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet 8:e1002660
Article Google Scholar
Yu Y, Than C, Degnan JH, Nakhleh L (2011) Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 60(2):138–149
Article Google Scholar
Zhang C, Ogilvie HW, Drummond AJ, Stadler T (2018) Bayesian inference of species networks from multilocus sequence data. Mol Biol Evolut 35(504–517):02
Google Scholar
Zhu J, Yu Y, Nakhleh L (2016) In the light of deep coalescence: revisiting trees within networks. BMC Bioinform 17:415
Article Google Scholar
Zhu S, Degnan J (2017) Displayed trees do not determine distinguishability under the network multispecies coalescent. Syst Biol 66:283298
Google Scholar

Download references

Acknowledgements

The author deeply thanks John A. Rhodes and Elizabeth S. Allman for their technical assistance and suggestions during the development of this work, and the reviewers for their valuable suggestions and observations.

Author information

Authors and Affiliations

University of Alaska Fairbanks, P.O. Box 756660, Fairbanks, AK, 99775-6660, USA
Hector Baños

Authors

Hector Baños
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hector Baños.

Additional information

This research was supported in part by the National Institutes of Health Grant R01 GM117590, awarded under the Joint DMS/NIGMS Initiative to Support Research at the Interface of the Biological and Mathematical Sciences.

Appendix

Here, Proposition 1 of Section 2 is proved. The argument uses the following.

Lemma 17

Let \(\mathcal {N}^+\) be a (metric or topological) rooted network on X and let \(Z\subset X\). For any edge e below LSA(Z), with a descendant in Z, there are \(x,y\in Z\) such that e is in a simple trek in \(\mathcal {N}^+\) from x to y whose edges are below LSA(Z).

Proof

Let \(x\in Z\) be below e. By Lemma 2 there exists \(y\in Z\) with LSA(x, y) above e.

Suppose y is not below e. Let \(P_x\) be a path from LSA(x, y) to x containing e and let \(P_y\) be a path from LSA(x, y) to y. Let u be the minimal node in the intersection of \(P_x\) and \(P_y\). Since y is not below e, u cannot be below e. Then the subpath of \(P_x\) from u to x, which contains e, and the subpath of \(P_y\) from f to y form a simple trek containing e.

Now assume y is below e. Since e is below LSA(x, y), there exists a path from LSA(x, y) to one of y or x that does not pass through the child of e. Without loss of generality suppose such a path \(P_y\) goes from LSA(x, y) to y. Let \(P_x\) be a path from LSA(x, y) to x that passes through e. Let \(A=A(P_x,P_y)\) be the set of nodes above e, common to \(P_y\) and \(P_x\). Let \(a\in A\) be the minimal node in A.

Let \(B(P_y,P_x)\) be the set of nodes below e, common to \(P_y\) and \(P_x\). We may assume that we choose \(P_x\) and \(P_y\) such that \(B=B(P_y,P_x)\) has minimal cardinality. If \(B=\emptyset \) then the desired trek is easily constructed, with top a. So suppose \(B\ne \emptyset \) has minimal element \(b^-\) and maximal element \(b^+\). We are going to contradict the minimality of B. Note that \(b^+\) must be the hybrid node of a cycle containing e (see Fig. 25 for a graphical reference).

Since \(b^-\) is not LSA(x, y), there exists a path \(P^*\) from LSA(x, y) to one of x or y that does not pass through \(b^-\). Note that \(P^*\) has to intersect at least one of \(P_y\) or \(P_x\) at an internal node below \(b^-\). Let \(C_1\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\) and let \(C_2\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\). Let c be the maximal node in \(C_1\cup C_2\). We can assume, without loss of generality, that c is in \(P_y\). This is because if instead, c were in \(P_x\), we can construct paths \(P_x'\) and \(P_y'\) where \(P_i'\) contains all the edges in \(P_i\) above \(b^-\) and all edges of \(P_j\) below \(b^-\) for \(i,j\in \{x,y\}\), \(i\ne j\). Note that \(P_x'\) passes through e and does not contains c, while \(P_y'\) does not pass through e, contains c, and \(B=B(P_y',P_x')\).

Denote by W the set of nodes in \((P^*\cap P_y)\cup (P^*\cap P_x)\) and let w be the minimal node of W above \(b^-\). Since \(\mathcal {N}^+\) is binary, w cannot be a or \(b^+\) (see Fig. 25 for a graphical reference). There are 5 different cases of the location of w in the network composed by the paths \(P_y\) and \(P_x\). These are

1.
w is in \(P_y\), above \(b^+\) but below a.
2.
w is in \(P_x\), above \(b^+\) but below e.
3.
w is in \(P_x\), above e but below a.
4.
w is in one or more of \(P_x\) or \(P_y\), above a.
5.
w is in one or more of \(P_x\) or \(P_y\), above \(b^-\) but below \(b^+\).

Figure 25 depicts in gray the graph composed by the paths \(P_y\) and \(P_x\), and in black we see the possible subpaths of \(P^*\) from w to c. In any of case 1, 2 or 3 we can find a simple trek containing e as depicted in Fig. 26 by choosing the appropriate edges, and thus, B was not minimal. For case 4 and 5 there are two possibilities; (i) w is in both \(P_y\) and \(P_x\); (ii) w is only in one of \(P_y\) or \(P_x\). For case 4 (i), the situation is simple, and we can find a simple trek as depicted on the left in Fig. 27. For case 4 (ii), we first find the node in A that is right above w. Then as depicted on the left of Fig. 27 we can find a simple trek.

For case 5 we do not find a simple trek directly, instead we construct two paths \(P_1\) and \(P_2\) from LSA(x, y) to x, y, respectively, only one of which contains e with at least one less node in \(B(P_1,P_2)\) than B. For case 5 (i), we just take \(P_1\) to be the same as \(P_x\) and for \(P_2\) we consider the same edges that are in \(P_y\) above w, the edges below c, and the edges in \(P^*\) between w and c. For case 5 (ii), we assume without loss of generality that w is in \(P_x\). Let b be the node in B right above w. Let \(P_1\) be the path containing the edges in \(P_x\) that are above b, the edges in \(P_y\) that are below b but above the node \(b'\in B\) right below w, and at last the edges in \(P_x\) below \(b'\). Let \(P_2\) the path containing the edges in \(P_y\) that are above b, the edges in \(P_x\) that are above a but below b, the edges in \(P^*\) that are above c but below w and at last the edges in \(P_y\) that are below c. Figure 27 (right) depicts \(P_1\) (red) and \(P_2\) (blue) for (i) and (ii). Since \(B(P_1,P_2)\) has at least one less node that B and we assumed B, the minimality of B is contradicted. \(\square \)

Proof (of Proposition 1)

Let \(M^+=\mathcal {N}^\oplus _Z\). Let \(M^-\) be the graph obtained from \(M^+\) by ignoring the direction of all tree edges and then suppressing the LSA(\(Z,\mathcal {N}^+\)), that is, the induced unrooted network from \(M^+\). Denote by \(M'\) the graph obtained by ignoring all directions of the tree edges in \(M^+\), so that by suppressing degree two nodes of either \(M^-\) or \(M'\) gives \((\mathcal {N}^+_Z)^-\). Let K be the graph obtained by considering all the edges in simple treks in \(\mathcal {N}^-\) from x to y for all \(x,y\in Z\), so that suppressing degree two nodes in K gives \((\mathcal {N}^-)_Z\). Showing either \(M'=K\) or \(M^-=K\), will prove the claim.

First we show that if LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)) then \(M'=K\), by arguing that \(M'\) and K have the same edges. Let e be an edge of \(M'\). Since LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)), \(M'\) is a subgraph of \(\mathcal {N}^-\) and e is directed in \(M^+\). By Lemma 17, e is in a simple trek in \(M^+\) from x to y, for some \(x,y\in Z\). This trek induces a simple trek in \(M'\) from x to y, and therefore a simple trek in \(\mathcal {N}^-\) from x to y. Thus, e is in K.

Now let e be an edge of K. Then there exists a simple trek \((\overline{P_1},\overline{P_2})\) in \(\mathcal {N}^-\) from x to y, for some \(x,y\in Z\) containing e. Let \(v=\)top\((\overline{P_1},\overline{P_2})\) and let T be the sequence of incident edges in \(\mathcal {N}^+\) from x to v conformed of edges inducing those in \(\overline{P_1}\) and \(\overline{P_2}\). Since \((\overline{P_1},\overline{P_2})\) is simple, T does not have repeated edges. Following T in \(\mathcal {N}^+\) from x to y, edges are first transversed “uphill” (in reverse direction) until there is a first “downhill” edge (u, w). The next edge in T cannot be uphill, as otherwise it would be hybrid and \((\overline{P_1},\overline{P_2})\) would have not been a trek in \(\mathcal {N}^-\). This argument applies for all consecutive edges in T until we end at y. Thus, there is a simple trek \((\overline{P_1},\overline{P_2})\) from x to y in \(\mathcal {N}^+\) with top u. Note that u must be below or equal to LSA(\(Z,\mathcal {N}^+\)) since otherwise the trek would not be simple. Moreover, \(P_1\) and \(P_2\) contain only edges in \(M^+\) and thus in \(M'\) after the directions of the tree edges is omitted. Thus, e is in \(M'\), so \(K=M'.\)

If LSA(\(Z,\mathcal {N}^+\))\(=\)LSA(\(X,\mathcal {N}^+\)) then \(M^-=K\) follows from a straight forward modification of the previous argument to account for the suppression of LSA\((z,\mathcal {N}^+)\) in both \(M^-\) and K. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Baños, H. Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model. Bull Math Biol 81, 494–534 (2019). https://doi.org/10.1007/s11538-018-0485-4

Download citation

Received: 27 November 2017
Accepted: 30 July 2018
Published: 09 August 2018
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s11538-018-0485-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model

Abstract

Access this article

Similar content being viewed by others

The tree of blobs of a species network: identifiability under the coalescent

In the light of deep coalescence: revisiting trees within networks

NANUQ: a method for inferring species networks from gene trees under the coalescent model

References

Acknowledgements