Abstract
Phylogenetic networks generalize phylogenetic trees, and have been introduced in order to describe evolution in the case of transfer of genetic material between coexisting species. There are many classes of phylogenetic networks, which can all be modeled as families of graphs with labeled leaves. In this paper, we focus on rooted and unrooted level-k networks and provide enumeration formulas (exact and asymptotic) for rooted and unrooted level-1 and level-2 phylogenetic networks with a given number of leaves. We also prove that the distribution of some parameters of these networks (such as their number of cycles) are asymptotically normally distributed. These results are obtained by first providing a recursive description (also called combinatorial specification) of our networks, and by next applying classical methods of enumerative, symbolic and analytic combinatorics.
Similar content being viewed by others
Notes
The tail of an arc is by definition its starting point. Its arrival point is called head.
Although it is also very classical, the case of unlabeled objects (with their corresponding ordinary generating functions) will not be useful in our work, and is therefore omitted from our presentation.
Aperiodicity is needed only for the third item below. The definition of aperiodicity is omitted from this paper, and can be found in Definition IV.5 of Flajolet and Sedgewick (2008). A sufficient condition for a power series to be aperiodic (which applies to all examples considered in this paper), is to have \(\phi _n >0\) for all n.
References
Boc A, Diallo AB, Makarenkov V (2012) T-rex: a web server for inferring, validating and visualizing phylogenetic trees and networks. Nucleic Acids Res 40(W1):W573–W579
Chang K-Y, Hon W-K, Thankachan SV (2018) Compact encoding for galled-trees and its applications. In: 2018 Data Compression Conference, pp 297–306
Drmota M (2009) Random trees. Springer, Berlin
Duchon P, Flajolet P, Louchard G, Schaeffer G (2004) Boltzmann samplers for the random generation of combinatorial structures. Comb Probab Comput 13:577–625
Flajolet P, Sedgewick R (2008) Analytic combinatorics. Cambridge University Press, Cambridge
Flajolet P, Zimmermann P, Cutsem BV (1994) A calculus for the random generation of labelled combinatorial structures. Theor Comput Sci 132(1–2):1–35
Fuchs M, Gittenberger B, Mansouri M (2019) Counting phylogenetic networks with few reticulation vertices: tree-child and normal networks. Aust J Comb 73(2):385–423
Gambette P, Berry V, Paul C (2012) Quartets and unrooted phylogenetic networks. J Bioinform Comput Biol 10(4):1250004.1–1250004.23
Gambette P, van Iersel L, Kelk S, Pardi F, Scornavacca C (2016) Do branch lengths help to locate a tree in a phylogenetic network? Bull Math Biol 78(9):1773–1795
Gambette P, Berry V, Paul C (2009) The structure of level-\(k\) phylogenetic networks. In: Twentieth annual symposium on combinatorial pattern matching (CPM’09)’, vol 5577 of Lecture notes in computer science. Springer, pp 289–300
Gunawan AD, Rathin J, Zhang L (2020) Counting and enumerating galled networks. Discrete Appl Math 644–654:644–654
Huber K, Moulton V, Wu T (2016) Transforming phylogenetic networks: moving beyond tree space. J Theor Biol 404:30–39
Huber K, van Iersel L, Moulton V, Scornavacca C, Wu T (2017) Reconstructing phylogenetic level-1 networks from nondense binet and trinet sets. Algorithmica 77(1):173–200
Huber K, Moulton V, Semple C, Wu T (2018) Quarnet inference rules for level-1 networks. Bull Math Biol 80:2137–2153
Janssen R, Jones M, Erdös PL, van Iersel L, Scornavacca C (2018) Exploring the tiers of rooted phylogenetic network space using tail moves. Bull Math Biol 80:2177–2208
Labarre A, Verwer S (2014) Merging partially labelled trees: hardness and a declarative programming solution. IEEE/ACM Trans Comput Biol Bioinform 11(2):389–397
Lempel A, Even S, Cederbaum I (1967) An algorithm for planarity testing of graphs. In: Theory of graphs: international symposium, pp 215–232
McDiarmid C, Semple C, Welsh D (2015) Counting phylogenetic networks. Ann Comb 19(1):205–224
OEIS Foundation Inc. (2019) The on-line encyclopedia of integer sequences. http://oeis.org
Posada D, Crandall KA (2001) Intraspecific gene genealogies: trees grafting into networks. TEE 16(1):37–45
Semple C, Steel M (2006) Unicyclic networks: compatibility and enumeration. IEEE/ACM Trans Comput Biol Bioinform 3:398–401
van Iersel L, Moulton V (2014) Trinets encode tree-child and level-2 phylogenetic networks. J Math Biol 68(7):1707–1729
van Iersel L, Moulton V (2018) Leaf-reconstructibility of phylogenetic networks. SIAM J Discrete Math 32:2047–2066
van Iersel L, Keijsper J, Kelk S, Stougie L, Hagen F, Boekhout T (2009) Constructing level-2 phylogenetic networks from triplets. IEEE/ACM Trans Comput Biol Bioinform 6(4):667–681
Willems M, Tahiri N, Makarenkov V (2014) A new efficient algorithm for inferring explicit hybridization networks following the Neighbor-Joining principle. J Bioinform Comput Biol 12(5):1450024
Acknowledgements
This work was supported by a “junior guest” grant by the LABRI and bilateral Austrian-Taiwanese Project FWF-MOST, Grants I 2309-N35 (FWF) and MOST-104-2923-M-009-006-MY3 (MOST). We thank Carine Pivoteau for her insights about random generation of combinatorial structures as well as two anonymous reviewers for their useful comments.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix
Appendix
1.1 Case analysis for unrooted level-2 generators
In the pictures below, we use thick lines to represent paths containing at least 2 internal nodes incident with a cut-edge which is incident with another pointed unrooted level-2 network. We use # to represent the fictitious root in the pointed network, v to denote its neighbour, and \({\mathcal {U}}\) to represent any pointed network.
1.1.1 Case 1: One edge with an attached network
One edge of the generator carries a sequence of at least two incident cut-edges. Because multiple edges are not allowed, it cannot be one of the two edges incident to v. So, it can be only one of the two edges not incident to v (which are not distinguished). The sequence is unorie of symmetry, explaining the factor \(\tfrac{1}{2}\) below.
1.1.2 Case 2: Two edges with attached networks
Case 2A: Two edges of the generator carry exactly one incident cut-edge. Since multiple edges are not allowed, it can either be one edge incident to v and one not, or both edges not incident to v. In the latter case, the two edges should not be distinguished, hence the factor \(\tfrac{1}{2}\).
Case 2B: One edge of the generator carries a single incident cut-edge and another edge carries a sequence of at least two incident cut-edges. Again, these cannot be the two edges incident to v. The only case where symmetries need to be taken care of is when the two edges are those not incident to v: in this case, the sequence is not oriented, hence the factor \(\tfrac{1}{2}\). In all other cases, the orientation of the sequence is determined by the presence of the fictitious root or the outgoing arc from the other edge with and attached network.
Case 2C: Two edges of the generator (but not the two incident to v, as before) carry a sequence of at least two incident cut-edges. If one arc is incident to v and the other not, then both sequences are oriented and there is no symmetry factor. If the two arcs are those not incident to v, then the two sequences they carry can be seen as an unordered pair of oriented sequences, seen up to symmetry w.r.t. the vertical axis. This yields a factor \(\tfrac{1}{2}\) since the pair is unordered, and another factor \(\tfrac{1}{2}\) to account for the symmetry w.r.t. the vertical axis.
1.1.3 Case 3: Three edges with attached networks
Case 3A: Three edges of the generator carry exactly one incident cut-edge. The unused edge can either be incident with v or not. In both cases, we have a factor \(\tfrac{1}{2}\) because of symmetry.
Case 3B: Two edges of the generator carry a single incident cut-edge and one carries a sequence of at least two incident cut-edges. The only cases where a symmetry comes into play here are when the edges carrying a single incident cut-edge are either the two edges incident to v or the two edges not incident to v. This yield the factor \(\tfrac{1}{2}\) in these two cases. Moreover, all sequences are oriented, because of the presence of the fictitious root or the single incident cut-edges.
Case 3C: One edge of the generator carries a single incident cut-edge and two edges carry a sequence of at least two incident cut-edges. Similarly to the previous case, we obtain a factor \(\tfrac{1}{2}\) for symmetry reasons when the two edges carrying sequences are either the two edges incident to v or the two edges not incident to v. Moreover, all sequences are oriented, because of the presence of the fictitious root or the single incident cut-edge.
Case 3D: Three edges of the generator carry a sequence of at least two incident cut-edges. In both cases, we have a factor \(\tfrac{1}{2}\) for symmetry reason, but all sequences are oriented by the presence of the fictitious root, or of the sequence on the edge(s) incident to v.
1.1.4 Case 4: Four edges with attached networks
Case 4A: The four edges of the generator each carry exactly one incident cut-edge. In this case, the two edges incident to v can be exchanged without modifying the network, and the same holds for the two edges not incident to v. This yields a factor \(\tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{4}\) due to symmetries.
Case 4B: Three edges of the generator carry a single incident cut-edge and the fourth one carries a sequence of at least two incident cut-edges. If this fourth edge is one incident to v, then the sequence it carries is oriented by the presence of the fictitious root, but the two arcs pending on the edges not incident to v are symmetric, hence a factor \(\tfrac{1}{2}\). If on the contrary the edge carrying the sequence is not incident to v, then the sequence is also oriented, this time because of the arcs attached to the edges incident to v. Moreover, the picture has a symmetry w.r.t. the vertical axis, hence a factor \(\tfrac{1}{2}\).
Case 4C: Two edges carry a single incident cut-edge and the two others carry a sequence of at least two incident cut-edges. In all cases, the sequences are oriented, by the presence of either the fictitious root or of the single arcs attached to edges. If the edges carrying sequences are one incident to v and the other not incident to v, all edges are in addition distinguished from each other. In the other two cases, both edges incident to v form an unordered pair, as well as the two edges not incident to v. In each case, we therefore have a factor \(\tfrac{1}{4}\).
Case 4D: One edge of the generator carries a single incident cut-edge and three edges carry a sequence of at least two incident cut-edges. As in the previous case, all sequences are oriented. However, if the two edges incident to v carry a sequence, the picture has a symmetry w.r.t. the vertical axis, hence a factor \(\tfrac{1}{2}\). If on the contrary the two edges not incident to v carry a sequence, these two edges are indistinguishable, hence a factor \(\tfrac{1}{2}\) also in this case.
Case 4E: All four edges of the generator carry a sequence of at least two incident cut-edges. Then all sequences are oriented, but the two edges not incident to v are indistinguishable. The picture has in addition a symmetry w.r.t. the vertical axis. This yields a factor \(\tfrac{1}{4}\).
1.2 Case analysis for the rooted level-2 generator 2b
In the pictures below, we use thick lines to represent paths containing at least one internal node incident with a cut arc which is incident with the root of another rooted level-2 network. All arcs are directed downwards. We use \({\mathcal {L}}\) to represented any rooted level-2 network.
1.2.1 Case 1
Only one arc of the generator carries a sequence of at least one outgoing arc. This arc can only be e or \(e'\) (and these cases are indistinguishable), since otherwise the network would contain multiple arcs, and this is not allowed.
1.2.2 Case 2
Exactly two arcs of the generator carry a sequence of at least one outgoing arc. To avoid multiple arcs, either these two arcs are e and \(e'\) (and those two arcs are symmetric, hence the factor \(\frac{1}{2}\)), or one of them is e or \(e'\) (which are not distinguished) and the other arc is chosen among the three arcs different from e and \(e'\).
1.2.3 Case 3
Exactly three arcs of the generator carry a sequence of at least one outgoing arc. Here, there are two possibilities. Either both e and \(e'\) are among those three arcs (and those two arcs are symmetric, hence the factor \(\frac{1}{2}\)). Or, to avoid multiple arcs, we must choose one of e and \(e'\) (which are not distinguished from each other), and two additional arcs among the three remaining arcs.
1.2.4 Case 4
Exactly four arcs of the generator carry a sequence of at least one outgoing arc. Either both e and \(e'\) are among those four arcs (and those two arcs are symmetric, hence the factor \(\frac{1}{2}\)), so the last two are chosen among the three other arcs of the generator. Or we choose the three arcs of the generator other than e and \(e'\), and e (which is undistinguishable from \(e'\)).
1.2.5 Case 5
All five arcs of the generator carry a sequence of at least one outgoing arc. The fact that e and \(e'\) are symmetric explains the factor \(\frac{1}{2}\).
1.3 Exact enumeration formulas
1.3.1 Unrooted level-2 networks
Proposition 13
For any \(n \ge 1\), the number \(u_n\) of unrooted level-2 phylogenetic networks with \((n+1)\) leaves is given by
Proof (Sketch)
Recall that \(U(z)=z \phi (U(z))\) with \(\phi (z) = \frac{1}{1-\frac{3z^5-16z^4+32z^3-30z^2+12z}{4(1-z)^4}}\). Using first the classical development of \((1-z)^{-n}\) in series (see Eq. (2)), and then the binomial theorem, we have
We continue applying the binomial theorem inside the above formula, isolating each time the term with the lowest degree in the numerator (that is, first \(\tfrac{- 30z^2}{4(1-z)^4}\), second \(\tfrac{32z^3}{4(1-z)^4}\), ...). This yields
The result then follows from developing of \((1-z)^{-4i}\) in series as \((1-z)^{-4i} = \sum _{j \ge 0} {{4i+j-1} \atopwithdelims (){j}} z^j\) and using the Lagrange inversion formula. \(\square \)
1.3.2 Rooted level-2 networks
Proposition 14
For any \(n \ge 1\), the number \(\ell _n\) of rooted level-2 phylogenetic networks with n leaves is given by
Proof (Sketch)
This follows again from the Lagrange inversion formula, using the equation \({L}(z)= z\phi ( {L}(z))\) for the function \(\phi \) given in Theorem 6. The computations involve the usual development of \((1-z)^{-n}\) given by Eq. (2) and the binomial formula, applied following exactly the same steps as in the proof of Proposition 13. Details of the computations are left to the reader. \(\square \)
Rights and permissions
About this article
Cite this article
Bouvel, M., Gambette, P. & Mansouri, M. Counting phylogenetic networks of level 1 and 2. J. Math. Biol. 81, 1357–1395 (2020). https://doi.org/10.1007/s00285-020-01543-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00285-020-01543-5
Keywords
- Phylogenetic networks
- Level
- Galled trees
- Counting
- Combinatorial specification
- Generating function
- Asymptotic normal distribution