Skip to main content
Log in

Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model

  • Special Issue: Algebraic Methods in Phylogenetics
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

We show that many topological features of level-1 species networks are identifiable from the distribution of the gene tree quartets under the network multi-species coalescent model. In particular, every cycle of size at least 4 and every hybrid node in a cycle of size at least 5 are identifiable. This is a step toward justifying the inference of such networks which was recently implemented by Solís-Lemus and Ané. We show additionally how to compute quartet concordance factors for a network in terms of simpler networks, and explore some circumstances in which cycles of size 3 and hybrid nodes in 4-cycles can be detected.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24

Similar content being viewed by others

References

  • Allman ES, Degnan JH, Rhodes JA (2011) Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J Math Biol 62(6):833–862

    Article  MathSciNet  MATH  Google Scholar 

  • Ané C, Larget B, Baum DA, Smith SD, Rokas A (2007) Bayesian estimation of concordance among gene trees. Mol Biol Evolut 24(2):412–426

    Article  Google Scholar 

  • Arnold ML (1997) Natural hybridization and evolution, vol 53. Oxford University Press, Oxford

    Google Scholar 

  • Bapteste E, van Iersel L, Janke A, Kelchner S, Kelk S, McInerney JO, Morrison DA, Nakhleh L, Steel M, Stougie L, Whitfield J (2013) Networks: expanding evolutionary thinking. Trends Genet 29(8):439–441

    Article  Google Scholar 

  • Carstens BC, Knowles LL, Tim C (2007) Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from melanoplus grasshoppers. Syst Biol 56(3):400–411

    Article  Google Scholar 

  • Degnan JH (2010) Probabilities of gene trees with intraspecific sampling given a species tree. In: Knowles LL, Kubatko LS (eds) Estimating species trees: practical and theoretical aspects. Wiley-Blackwell, pp 53–78. ISBN 0470526858

  • Ellstrand NC, Whitkus R, Rieseberg LH (1996) Distribution of spontaneous plant hybrids. Proc Nat Acad Sci U S A 93(10):5090–5093

    Article  Google Scholar 

  • Gusfield D, Bansal V, Bafna V, Song YS (2007) A decomposition theory for phylogenetic networks and incompatible characters. J Comput Biol 14(10):1247–1272

    Article  MathSciNet  Google Scholar 

  • Huber KT, van Iersel L, Moulton V, Scornavacca C, Wu T (2017) Reconstructing phylogenetic level-1networks from nondense binet and trinet sets. Algorithmica 77(1):173–200

    Article  MathSciNet  MATH  Google Scholar 

  • Huber KT, Moulton V, Semple C, Wu T (2017) Quarnet inference rules for level-1 networks. https://arxiv.org/pdf/1711.06720.pdf

  • Keijsper JCM, Pendavingh RA (2014) Reconstructing a phylogenetic Level-1 network from quartets. Bull Math Biol 76(10):2517–2541

    Article  MathSciNet  MATH  Google Scholar 

  • Linder CR, Rieseberg LH (2004) Reconstructing patterns of reticulate evolution in plants. Am J Bot 91(10):1700–1708

    Article  Google Scholar 

  • Liu Liang Yu, Scott Lili Edwards, V. (2010) A maximum pseudo-likelihood approach for estimating species trees under the coalescent model. BMC Evolut Biol 10(1):302

  • Mallet J (2005) Hybridization as an invasion of the genome. Trends Ecol Evolut 20(5):229 – 237. Special issue: invasions, guest edited by Michael E. Hochberg and Nicholas J. Gotelli

  • Meng C, Kubatko LS (2009) Detecting hybrid speciation in the presence of incomplete lineage sorting using gene tree incongruence: a model. Theor Popul Biol 75(1):35–45

    Article  MATH  Google Scholar 

  • Nakhleh L (2010) Evolutionary phylogenetic networks: models and issues. In: Heath L, Ramakrishnan N (eds) Problem solving handbook in computational biology and bioinformatics. Springer, Boston, pp 125–158

    Chapter  Google Scholar 

  • Noor MA, Feder JL (2006) Speciation genetics: evolving approaches. Nat Rev Genet 7(11):851–861

    Article  Google Scholar 

  • Pamilo P, Nei M (1988) Relationships between gene trees and species trees. Mol Biol Evolut 5:568583

    Google Scholar 

  • Pollard DA, Iyer VN, Moses AM, Eisen MB (2006) Widespread discordance of gene trees with species tree in drosophila: evidence for incomplete lineage sorting. PLoS Genet 2(10):1634–1647

    Google Scholar 

  • Rieseberg LH, Baird SJ, Gardner KA (2000) Hybridization, introgression, and linkage evolution. Plant Mol Biol 42(1):205–224

    Article  Google Scholar 

  • Rosselló F, Valiente G (2009) All that glisters is not galled. Math Biosci 221(1):54–59

    Article  MathSciNet  MATH  Google Scholar 

  • Semple C, Steel M (2005) Phylogenetics. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Solís-Lemus C, Ané C (2016) Inferring phylogenetic networks with maximum pseudolikelihood under incomplete lineage sorting. PLoS Genet 12(3):e1005896

    Article  Google Scholar 

  • Solís-Lemus C, Ané C, Yang M (2016) Inconsistency of species tree methods under gene flow. Syst Biol 65(5):843–851

    Article  Google Scholar 

  • Steel M (2016) Phylogeny discrete and random processes in evolution. SIAM, Philadelphia

    Book  MATH  Google Scholar 

  • Sullivant S, Talaska K, Draisma J (2010) Trek separation for gaussian graphical models. Ann Statist 38(3):1665–1685

    Article  MathSciNet  MATH  Google Scholar 

  • Syring J, Willyard A, Cronn R, Liston A (2005) Evolutionary relationships among Pinus (Pinaceae) subsections inferred from multiple low-copy nuclear loci. Am J Bot 92(12):2086–2100

    Article  Google Scholar 

  • John Wakeley (2008) Coalescent theory: an introduction, vol 58. Roberts and Company Publishers, Englewood

    MATH  Google Scholar 

  • Yu Y, Degnan JH, Nakhleh L (2014) Maximum likelihood inference of reticulate evolutionary histories. PNAS 111(296–305):11

    Google Scholar 

  • Yu Y, Degnan JH, Nakhleh L (2012) The probability of a gene tree topology within a phylogenetic network with applications to hybridization detection. PLoS Genet 8:e1002660

    Article  Google Scholar 

  • Yu Y, Than C, Degnan JH, Nakhleh L (2011) Coalescent histories on phylogenetic networks and detection of hybridization despite incomplete lineage sorting. Syst Biol 60(2):138–149

    Article  Google Scholar 

  • Zhang C, Ogilvie HW, Drummond AJ, Stadler T (2018) Bayesian inference of species networks from multilocus sequence data. Mol Biol Evolut 35(504–517):02

    Google Scholar 

  • Zhu J, Yu Y, Nakhleh L (2016) In the light of deep coalescence: revisiting trees within networks. BMC Bioinform 17:415

    Article  Google Scholar 

  • Zhu S, Degnan J (2017) Displayed trees do not determine distinguishability under the network multispecies coalescent. Syst Biol 66:283298

    Google Scholar 

Download references

Acknowledgements

The author deeply thanks John A. Rhodes and Elizabeth S. Allman for their technical assistance and suggestions during the development of this work, and the reviewers for their valuable suggestions and observations.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Hector Baños.

Additional information

This research was supported in part by the National Institutes of Health Grant R01 GM117590, awarded under the Joint DMS/NIGMS Initiative to Support Research at the Interface of the Biological and Mathematical Sciences.

Appendix

Appendix

Here, Proposition 1 of Section 2 is proved. The argument uses the following.

Lemma 17

Let \(\mathcal {N}^+\) be a (metric or topological) rooted network on X and let \(Z\subset X\). For any edge e below LSA(Z), with a descendant in Z, there are \(x,y\in Z\) such that e is in a simple trek in \(\mathcal {N}^+\) from x to y whose edges are below LSA(Z).

Proof

Let \(x\in Z\) be below e. By Lemma 2 there exists \(y\in Z\) with LSA(xy) above e.

Suppose y is not below e. Let \(P_x\) be a path from LSA(xy) to x containing e and let \(P_y\) be a path from LSA(xy) to y. Let u be the minimal node in the intersection of \(P_x\) and \(P_y\). Since y is not below e, u cannot be below e. Then the subpath of \(P_x\) from u to x, which contains e, and the subpath of \(P_y\) from f to y form a simple trek containing e.

Now assume y is below e. Since e is below LSA(xy), there exists a path from LSA(xy) to one of y or x that does not pass through the child of e. Without loss of generality suppose such a path \(P_y\) goes from LSA(xy) to y. Let \(P_x\) be a path from LSA(xy) to x that passes through e. Let \(A=A(P_x,P_y)\) be the set of nodes above e, common to \(P_y\) and \(P_x\). Let \(a\in A\) be the minimal node in A.

Let \(B(P_y,P_x)\) be the set of nodes below e, common to \(P_y\) and \(P_x\). We may assume that we choose \(P_x\) and \(P_y\) such that \(B=B(P_y,P_x)\) has minimal cardinality. If \(B=\emptyset \) then the desired trek is easily constructed, with top a. So suppose \(B\ne \emptyset \) has minimal element \(b^-\) and maximal element \(b^+\). We are going to contradict the minimality of B. Note that \(b^+\) must be the hybrid node of a cycle containing e (see Fig. 25 for a graphical reference).

Since \(b^-\) is not LSA(xy), there exists a path \(P^*\) from LSA(xy) to one of x or y that does not pass through \(b^-\). Note that \(P^*\) has to intersect at least one of \(P_y\) or \(P_x\) at an internal node below \(b^-\). Let \(C_1\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\) and let \(C_2\) be the set of nodes below \(b^-\), common to \(P^*\) and \(P_y\). Let c be the maximal node in \(C_1\cup C_2\). We can assume, without loss of generality, that c is in \(P_y\). This is because if instead, c were in \(P_x\), we can construct paths \(P_x'\) and \(P_y'\) where \(P_i'\) contains all the edges in \(P_i\) above \(b^-\) and all edges of \(P_j\) below \(b^-\) for \(i,j\in \{x,y\}\), \(i\ne j\). Note that \(P_x'\) passes through e and does not contains c, while \(P_y'\) does not pass through e, contains c, and \(B=B(P_y',P_x')\).

Denote by W the set of nodes in \((P^*\cap P_y)\cup (P^*\cap P_x)\) and let w be the minimal node of W above \(b^-\). Since \(\mathcal {N}^+\) is binary, w cannot be a or \(b^+\) (see Fig. 25 for a graphical reference). There are 5 different cases of the location of w in the network composed by the paths \(P_y\) and \(P_x\). These are

  1. 1.

    w is in \(P_y\), above \(b^+\) but below a.

  2. 2.

    w is in \(P_x\), above \(b^+\) but below e.

  3. 3.

    w is in \(P_x\), above e but below a.

  4. 4.

    w is in one or more of \(P_x\) or \(P_y\), above a.

  5. 5.

    w is in one or more of \(P_x\) or \(P_y\), above \(b^-\) but below \(b^+\).

Figure 25 depicts in gray the graph composed by the paths \(P_y\) and \(P_x\), and in black we see the possible subpaths of \(P^*\) from w to c. In any of case 1, 2 or 3 we can find a simple trek containing e as depicted in Fig. 26 by choosing the appropriate edges, and thus, B was not minimal. For case 4 and 5 there are two possibilities; (i) w is in both \(P_y\) and \(P_x\); (ii) w is only in one of \(P_y\) or \(P_x\). For case 4 (i), the situation is simple, and we can find a simple trek as depicted on the left in Fig. 27. For case 4 (ii), we first find the node in A that is right above w. Then as depicted on the left of Fig. 27 we can find a simple trek.

For case 5 we do not find a simple trek directly, instead we construct two paths \(P_1\) and \(P_2\) from LSA(xy) to x, y, respectively, only one of which contains e with at least one less node in \(B(P_1,P_2)\) than B. For case 5 (i), we just take \(P_1\) to be the same as \(P_x\) and for \(P_2\) we consider the same edges that are in \(P_y\) above w, the edges below c, and the edges in \(P^*\) between w and c. For case 5 (ii), we assume without loss of generality that w is in \(P_x\). Let b be the node in B right above w. Let \(P_1\) be the path containing the edges in \(P_x\) that are above b, the edges in \(P_y\) that are below b but above the node \(b'\in B\) right below w, and at last the edges in \(P_x\) below \(b'\). Let \(P_2\) the path containing the edges in \(P_y\) that are above b, the edges in \(P_x\) that are above a but below b, the edges in \(P^*\) that are above c but below w and at last the edges in \(P_y\) that are below c. Figure 27 (right) depicts \(P_1\) (red) and \(P_2\) (blue) for (i) and (ii). Since \(B(P_1,P_2)\) has at least one less node that B and we assumed B, the minimality of B is contradicted. \(\square \)

Fig. 25
figure 25

In gray we see the subgraph composed by P and \(P'\), the dashed edges represent that P and \(P'\) could intersect, the dotted segments represent just a succession of edges. In black we see the different cases of the possible edges in \(P^*\) above b but below a

Fig. 26
figure 26

The treks in case 1 (left), case 2 (center), and case 3 (right)

Fig. 27
figure 27

(Left) The treks in the two possibilities of case 4. (Right) The two possibilities of case 5, where the black segments represent possible edges red and blue at the same time

Proof (of Proposition 1)

Let \(M^+=\mathcal {N}^\oplus _Z\). Let \(M^-\) be the graph obtained from \(M^+\) by ignoring the direction of all tree edges and then suppressing the LSA(\(Z,\mathcal {N}^+\)), that is, the induced unrooted network from \(M^+\). Denote by \(M'\) the graph obtained by ignoring all directions of the tree edges in \(M^+\), so that by suppressing degree two nodes of either \(M^-\) or \(M'\) gives \((\mathcal {N}^+_Z)^-\). Let K be the graph obtained by considering all the edges in simple treks in \(\mathcal {N}^-\) from x to y for all \(x,y\in Z\), so that suppressing degree two nodes in K gives \((\mathcal {N}^-)_Z\). Showing either \(M'=K\) or \(M^-=K\), will prove the claim.

First we show that if LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)) then \(M'=K\), by arguing that \(M'\) and K have the same edges. Let e be an edge of \(M'\). Since LSA(\(Z,\mathcal {N}^+\))\(\ne \)LSA(\(X,\mathcal {N}^+\)), \(M'\) is a subgraph of \(\mathcal {N}^-\) and e is directed in \(M^+\). By Lemma 17, e is in a simple trek in \(M^+\) from x to y, for some \(x,y\in Z\). This trek induces a simple trek in \(M'\) from x to y, and therefore a simple trek in \(\mathcal {N}^-\) from x to y. Thus, e is in K.

Now let e be an edge of K. Then there exists a simple trek \((\overline{P_1},\overline{P_2})\) in \(\mathcal {N}^-\) from x to y, for some \(x,y\in Z\) containing e. Let \(v=\)top\((\overline{P_1},\overline{P_2})\) and let T be the sequence of incident edges in \(\mathcal {N}^+\) from x to v conformed of edges inducing those in \(\overline{P_1}\) and \(\overline{P_2}\). Since \((\overline{P_1},\overline{P_2})\) is simple, T does not have repeated edges. Following T in \(\mathcal {N}^+\) from x to y, edges are first transversed “uphill” (in reverse direction) until there is a first “downhill” edge (uw). The next edge in T cannot be uphill, as otherwise it would be hybrid and \((\overline{P_1},\overline{P_2})\) would have not been a trek in \(\mathcal {N}^-\). This argument applies for all consecutive edges in T until we end at y. Thus, there is a simple trek \((\overline{P_1},\overline{P_2})\) from x to y in \(\mathcal {N}^+\) with top u. Note that u must be below or equal to LSA(\(Z,\mathcal {N}^+\)) since otherwise the trek would not be simple. Moreover, \(P_1\) and \(P_2\) contain only edges in \(M^+\) and thus in \(M'\) after the directions of the tree edges is omitted. Thus, e is in \(M'\), so \(K=M'.\)

If LSA(\(Z,\mathcal {N}^+\))\(=\)LSA(\(X,\mathcal {N}^+\)) then \(M^-=K\) follows from a straight forward modification of the previous argument to account for the suppression of LSA\((z,\mathcal {N}^+)\) in both \(M^-\) and K. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Baños, H. Identifying Species Network Features from Gene Tree Quartets Under the Coalescent Model. Bull Math Biol 81, 494–534 (2019). https://doi.org/10.1007/s11538-018-0485-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-018-0485-4

Keywords

Navigation