Advertisement

Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model

Abstract

We show that many topological features of level-1 species networks are identifiable from the distribution of the gene tree quartets under the network multi-species coalescent model. In particular, every cycle of size at least 4 and every hybrid node in a cycle of size at least 5 are identifiable. This is a step toward justifying the inference of such networks which was recently implemented by Solís-Lemus and Ané. We show additionally how to compute quartet concordance factors for a network in terms of simpler networks, and explore some circumstances in which cycles of size 3 and hybrid nodes in 4-cycles can be detected.

This is a preview of subscription content, log in to check access.

Access options

Buy single article

Instant unlimited access to the full article PDF.

US$ 39.95

Price includes VAT for USA

Subscribe to journal

Immediate online access to all issues from 2019. Subscription will auto renew annually.

US$ 99

This is the net price. Taxes to be calculated in checkout.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

References

  1. Allman ES, Degnan JH, Rhodes JA (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62(6):833–862

  2. Ané C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees. Mol Biol Evolut 24(2):412–426

  3. Arnold ML (1997) Natural hybridization and evolution, vol 53. Oxford University Press, Oxford

  4. Bapteste E, van Iersel L, Janke A, Kelchner S, Kelk S, McInerney JO, Morrison DA, Nakhleh L, Steel M, Stougie L, Whitfield J (2013) Networks: expanding evolutionary thinking. Trends Genet 29(8):439–441

  5. Carstens BC, Knowles LL, Tim C (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers. Syst Biol 56(3):400–411

  6. Degnan JH (2010) Probabilities of gene trees with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell, pp 53–78. ISBN 0470526858

  7. Ellstrand NC, Whitkus R, Rieseberg LH (1996) Distribution of spontaneous plant hybrids. Proc Nat Acad Sci U S A 93(10):5090–5093

  8. Gusfield D, Bansal V, Bafna V, Song YS (2007) A decomposition theory for phylogenetic networks and incompatible characters. J Comput Biol 14(10):1247–1272

  9. Huber KT, van Iersel L, Moulton V, Scornavacca C, Wu T (2017) Reconstructing phylogenetic level-1networks from nondense binet and trinet sets. Algorithmica 77(1):173–200

  10. Huber KT, Moulton V, Semple C, Wu T (2017) Quarnet inference rules for level-1 networks. https://arxiv.org/pdf/1711.06720.pdf

  11. Keijsper JCM, Pendavingh RA (2014) Reconstructing a phylogenetic Level-1 network from quartets. Bull Math Biol 76(10):2517–2541

  12. Linder CR, Rieseberg LH (2004) Reconstructing patterns of reticulate evolution in plants. Am J Bot 91(10):1700–1708

  13. Liu Liang Yu, Scott Lili Edwards, V. (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evolut Biol 10(1):302

  14. Mallet J (2005) Hybridization as an invasion of the genome. Trends Ecol Evolut 20(5):229 – 237. Special issue: invasions, guest edited by Michael E. Hochberg and Nicholas J. Gotelli

  15. Meng C, Kubatko LS (2009) Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theor Popul Biol 75(1):35–45

  16. Nakhleh L (2010) Evolutionary phylogenetic networks: models and issues. In: Heath L, Ramakrishnan N (eds) Problem solving handbook in computational biology and bioinformatics. Springer, Boston, pp 125–158

  17. Noor MA, Feder JL (2006) Speciation genetics: evolving approaches. Nat Rev Genet 7(11):851–861

  18. Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evolut 5:568583

  19. Pollard DA, Iyer VN, Moses AM, Eisen MB (2006) Widespread discordance of gene trees with species tree in drosophila: evidence for incomplete lineage sorting. PLoS Genet 2(10):1634–1647

  20. Rieseberg LH, Baird SJ, Gardner KA (2000) Hybridization, introgression, and linkage evolution. Plant Mol Biol 42(1):205–224

  21. Rosselló F, Valiente G (2009) All that glisters is not galled. Math Biosci 221(1):54–59

  22. Semple C, Steel M (2005) Phylogenetics. Oxford University Press, Oxford

  23. Solís-Lemus C, Ané C (2016) Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet 12(3):e1005896

  24. Solís-Lemus C, Ané C, Yang M (2016) Inconsistency of species tree methods under gene flow. Syst Biol 65(5):843–851

  25. Steel M (2016) Phylogeny discrete and random processes in evolution. SIAM, Philadelphia

  26. Sullivant S, Talaska K, Draisma J (2010) Trek separation for gaussian graphical models. Ann Statist 38(3):1665–1685

  27. Syring J, Willyard A, Cronn R, Liston A (2005) Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. Am J Bot 92(12):2086–2100

  28. John Wakeley (2008) Coalescent theory: an introduction, vol 58. Roberts and Company Publishers, Englewood

  29. Yu Y, Degnan JH, Nakhleh L (2014) Maximum likelihood inference of reticulate evolutionary histories. PNAS 111(296–305):11

  30. Yu Y, Degnan JH, Nakhleh L (2012) The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet 8:e1002660

  31. Yu Y, Than C, Degnan JH, Nakhleh L (2011) Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 60(2):138–149

  32. Zhang C, Ogilvie HW, Drummond AJ, Stadler T (2018) Bayesian inference of species networks from multilocus sequence data. Mol Biol Evolut 35(504–517):02

  33. Zhu J, Yu Y, Nakhleh L (2016) In the light of deep coalescence: revisiting trees within networks. BMC Bioinform 17:415

  34. Zhu S, Degnan J (2017) Displayed trees do not determine distinguishability under the network multispecies coalescent. Syst Biol 66:283298

Download references

Acknowledgements

The author deeply thanks John A. Rhodes and Elizabeth S. Allman for their technical assistance and suggestions during the development of this work, and the reviewers for their valuable suggestions and observations.

Author information

Correspondence to Hector Baños.

Additional information

This research was supported in part by the National Institutes of Health Grant R01 GM117590, awarded under the Joint DMS/NIGMS Initiative to Support Research at the Interface of the Biological and Mathematical Sciences.

Appendix

Appendix

Here, Proposition 1 of Section 2 is proved. The argument uses the following.

Lemma 17

Let \(\mathcal {N}^+\) be a (metric or topological) rooted network on X and let \(Z\subset X\). For any edge e below LSA(Z), with a descendant in Z, there are \(x,y\in Z\) such that e is in a simple trek in \(\mathcal {N}^+\) from x to y whose edges are below LSA(Z).

Proof

Let \(x\in Z\) be below e. By Lemma 2 there exists \(y\in Z\) with LSA(xy) above e.

Suppose y is not below e. Let \(P_x\) be a path from LSA(xy) to x containing e and let \(P_y\) be a path from LSA(xy) to y. Let u be the minimal node in the intersection of \(P_x\) and \(P_y\). Since y is not below e, u cannot be below e. Then the subpath of \(P_x\) from u to x, which contains e, and the subpath of \(P_y\) from f to y form a simple trek containing e.

Now assume y is below e. Since e is below LSA(xy), there exists a path from LSA(xy) to one of y or x that does not pass through the child of e. Without loss of generality suppose such a path \(P_y\) goes from LSA(xy) to y. Let \(P_x\) be a path from LSA(xy) to x that passes through e. Let \(A=A(P_x,P_y)\) be the set of nodes above e, common to \(P_y\) and \(P_x\). Let \(a\in A\) be the minimal node in A.

Let \(B(P_y,P_x)\) be the set of nodes below e, common to \(P_y\) and \(P_x\). We may assume that we choose \(P_x\) and \(P_y\) such that \(B=B(P_y,P_x)\) has minimal cardinality. If \(B=\emptyset \) then the desired trek is easily constructed, with top a. So suppose \(B\ne \emptyset \) has minimal element \(b^-\) and maximal element \(b^+\). We are going to contradict the minimality of B. Note that \(b^+\) must be the hybrid node of a cycle containing e (see Fig. 25 for a graphical reference).

Since \(b^-\) is not LSA(xy), there exists a path \(P^*\) from LSA(xy) to one of x or y that does not pass through \(b^-\). Note that \(P^*\) has to intersect at least one of \(P_y\) or \(P_x\) at an internal node below \(b^-\). Let \(C_1\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\) and let \(C_2\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\). Let c be the maximal node in \(C_1\cup C_2\). We can assume, without loss of generality, that c is in \(P_y\). This is because if instead, c were in \(P_x\), we can construct paths \(P_x'\) and \(P_y'\) where \(P_i'\) contains all the edges in \(P_i\) above \(b^-\) and all edges of \(P_j\) below \(b^-\) for \(i,j\in \{x,y\}\), \(i\ne j\). Note that \(P_x'\) passes through e and does not contains c, while \(P_y'\) does not pass through e, contains c, and \(B=B(P_y',P_x')\).

Denote by W the set of nodes in \((P^*\cap P_y)\cup (P^*\cap P_x)\) and let w be the minimal node of W above \(b^-\). Since \(\mathcal {N}^+\) is binary, w cannot be a or \(b^+\) (see Fig. 25 for a graphical reference). There are 5 different cases of the location of w in the network composed by the paths \(P_y\) and \(P_x\). These are

  1. 1.

    w is in \(P_y\), above \(b^+\) but below a.

  2. 2.

    w is in \(P_x\), above \(b^+\) but below e.

  3. 3.

    w is in \(P_x\), above e but below a.

  4. 4.

    w is in one or more of \(P_x\) or \(P_y\), above a.

  5. 5.

    w is in one or more of \(P_x\) or \(P_y\), above \(b^-\) but below \(b^+\).

Figure 25 depicts in gray the graph composed by the paths \(P_y\) and \(P_x\), and in black we see the possible subpaths of \(P^*\) from w to c. In any of case 1, 2 or 3 we can find a simple trek containing e as depicted in Fig. 26 by choosing the appropriate edges, and thus, B was not minimal. For case 4 and 5 there are two possibilities; (i) w is in both \(P_y\) and \(P_x\); (ii) w is only in one of \(P_y\) or \(P_x\). For case 4 (i), the situation is simple, and we can find a simple trek as depicted on the left in Fig. 27. For case 4 (ii), we first find the node in A that is right above w. Then as depicted on the left of Fig. 27 we can find a simple trek.

For case 5 we do not find a simple trek directly, instead we construct two paths \(P_1\) and \(P_2\) from LSA(xy) to x, y, respectively, only one of which contains e with at least one less node in \(B(P_1,P_2)\) than B. For case 5 (i), we just take \(P_1\) to be the same as \(P_x\) and for \(P_2\) we consider the same edges that are in \(P_y\) above w, the edges below c, and the edges in \(P^*\) between w and c. For case 5 (ii), we assume without loss of generality that w is in \(P_x\). Let b be the node in B right above w. Let \(P_1\) be the path containing the edges in \(P_x\) that are above b, the edges in \(P_y\) that are below b but above the node \(b'\in B\) right below w, and at last the edges in \(P_x\) below \(b'\). Let \(P_2\) the path containing the edges in \(P_y\) that are above b, the edges in \(P_x\) that are above a but below b, the edges in \(P^*\) that are above c but below w and at last the edges in \(P_y\) that are below c. Figure 27 (right) depicts \(P_1\) (red) and \(P_2\) (blue) for (i) and (ii). Since \(B(P_1,P_2)\) has at least one less node that B and we assumed B, the minimality of B is contradicted. \(\square \)

Fig. 25
figure25

In gray we see the subgraph composed by P and \(P'\), the dashed edges represent that P and \(P'\) could intersect, the dotted segments represent just a succession of edges. In black we see the different cases of the possible edges in \(P^*\) above b but below a

Fig. 26
figure26

The treks in case 1 (left), case 2 (center), and case 3 (right)

Fig. 27
figure27

(Left) The treks in the two possibilities of case 4. (Right) The two possibilities of case 5, where the black segments represent possible edges red and blue at the same time

Proof (of Proposition 1)

Let \(M^+=\mathcal {N}^\oplus _Z\). Let \(M^-\) be the graph obtained from \(M^+\) by ignoring the direction of all tree edges and then suppressing the LSA(\(Z,\mathcal {N}^+\)), that is, the induced unrooted network from \(M^+\). Denote by \(M'\) the graph obtained by ignoring all directions of the tree edges in \(M^+\), so that by suppressing degree two nodes of either \(M^-\) or \(M'\) gives \((\mathcal {N}^+_Z)^-\). Let K be the graph obtained by considering all the edges in simple treks in \(\mathcal {N}^-\) from x to y for all \(x,y\in Z\), so that suppressing degree two nodes in K gives \((\mathcal {N}^-)_Z\). Showing either \(M'=K\) or \(M^-=K\), will prove the claim.

First we show that if LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)) then \(M'=K\), by arguing that \(M'\) and K have the same edges. Let e be an edge of \(M'\). Since LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)), \(M'\) is a subgraph of \(\mathcal {N}^-\) and e is directed in \(M^+\). By Lemma 17, e is in a simple trek in \(M^+\) from x to y, for some \(x,y\in Z\). This trek induces a simple trek in \(M'\) from x to y, and therefore a simple trek in \(\mathcal {N}^-\) from x to y. Thus, e is in K.

Now let e be an edge of K. Then there exists a simple trek \((\overline{P_1},\overline{P_2})\) in \(\mathcal {N}^-\) from x to y, for some \(x,y\in Z\) containing e. Let \(v=\)top\((\overline{P_1},\overline{P_2})\) and let T be the sequence of incident edges in \(\mathcal {N}^+\) from x to v conformed of edges inducing those in \(\overline{P_1}\) and \(\overline{P_2}\). Since \((\overline{P_1},\overline{P_2})\) is simple, T does not have repeated edges. Following T in \(\mathcal {N}^+\) from x to y, edges are first transversed “uphill” (in reverse direction) until there is a first “downhill” edge (uw). The next edge in T cannot be uphill, as otherwise it would be hybrid and \((\overline{P_1},\overline{P_2})\) would have not been a trek in \(\mathcal {N}^-\). This argument applies for all consecutive edges in T until we end at y. Thus, there is a simple trek \((\overline{P_1},\overline{P_2})\) from x to y in \(\mathcal {N}^+\) with top u. Note that u must be below or equal to LSA(\(Z,\mathcal {N}^+\)) since otherwise the trek would not be simple. Moreover, \(P_1\) and \(P_2\) contain only edges in \(M^+\) and thus in \(M'\) after the directions of the tree edges is omitted. Thus, e is in \(M'\), so \(K=M'.\)

If LSA(\(Z,\mathcal {N}^+\))\(=\)LSA(\(X,\mathcal {N}^+\)) then \(M^-=K\) follows from a straight forward modification of the previous argument to account for the suppression of LSA\((z,\mathcal {N}^+)\) in both \(M^-\) and K. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Baños, H. Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model. Bull Math Biol 81, 494–534 (2019). https://doi.org/10.1007/s11538-018-0485-4

Download citation

Keywords

  • Coalescent theory
  • Phylogenetics
  • Networks
  • Concordance factors