On the challenge of reconstructing level-1 phylogenetic networks from triplets and clusters

Phylogenetic networks have gained prominence over the years due to their ability to represent complex non-treelike evolutionary events such as recombination or hybridization. Popular combinatorial objects used to construct them are triplet systems and cluster systems, the motivation being that any network N induces a triplet system \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal R(N)$$\end{document}R(N) and a softwired cluster system \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal S(N)$$\end{document}S(N). Since in real-world studies it cannot be guaranteed that all triplets/softwired clusters induced by a network are available, it is of particular interest to understand whether subsets of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal R(N)$$\end{document}R(N) or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal S(N)$$\end{document}S(N) allow one to uniquely reconstruct the underlying network N. Here we show that even within the highly restricted yet biologically interesting space of level-1 phylogenetic networks it is not always possible to uniquely reconstruct a level-1 network N, even when all triplets in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal R(N)$$\end{document}R(N) or all clusters in \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal S(N)$$\end{document}S(N) are available. On the positive side, we introduce a reasonably large subclass of level-1 networks the members of which are uniquely determined by their induced triplet/softwired cluster systems. Along the way, we also establish various enumerative results, both positive and negative, including results which show that certain special subclasses of level-1 networks N can be uniquely reconstructed from proper subsets of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal R(N)$$\end{document}R(N) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\mathcal S(N)$$\end{document}S(N). We anticipate these results to be of use in the design of algorithms for phylogenetic network inference.


Introduction
Phylogenetic trees are essentially graph-theoretical trees whose set of leaves is labelled by a set of species or organisms (more abstractly, taxa) and which do not have any degree-two vertices, except possibly the root.They have been the model of choice for many years for shedding light on the evolutionary past of a set of taxa.However, in cases where the taxa are suspected to have undergone reticulate evolutionary events such as hybridization or recombination, trees have been found to not always be appropriate [19].The need for structures capable of appropriately dealing with such data sets, combined with the fact that different evolutionary processes have given rise to them, has resulted in the introduction of a number of more general structures for representing evolutionary relationships.Subsumed under the name "phylogenetic network" these include hybrid phylogenies [3], ancestral recombination graphs [10], galled trees [9,27], normal networks [28], regular networks [2], tree-sibling networks [5], level-k networks [15,22], median networks [1] and NeighborNets [4], to name just a few, which all generalize a phylogenetic tree in one way or another.
Apart from median networks and NeighborNets which are a special type of split-based phylogenetic network, the basic graph-theoretical structure underpinning a phylogenetic network is a rooted directed acyclic graph (DAG) that has a unique root and whose set of sinks is a given set of taxa.One of the combinatorially simplest types of phylogenetic network, but still complicated enough to be of interest to Evolutionary Biology, is that of a binary level-1 network (see Fig. 1 for an example).Such structures have attracted a considerable amount of Figure 1.A binary level-1 phylogenetic network N on X = {1, . . ., 10} that is also 4-outwards and saturated.As in all figures all arcs of the network are directed downwards, so we do not explicitly indicate the direction of arcs.interest in the literature (see e. g. [15,9,17,12]) and can informally be thought of as degreeconstrained rooted DAGs with vertex-disjoint undirected cycles.(Formal definitions of all terms will follow in later sections).However, this simplicity has proven to be deceptive, as the combinatorial structure of such networks has turned out to be more complicated than originally thought (see e.g.[8,11]).Limits on our ability to reconstruct level-1 networks constitute lower bounds on how well we can reconstruct phylogenetic networks more generally.On the other hand, positive results for reconstructing level-1 networks can be an important first step towards algorithms for reconstructing more complex phylogenetic networks.
In this paper, we start by establishing a number of enumerative results for binary level-1 networks.These include upper and lower bounds on the number of vertices and arcs in such networks.We gradually shift our focus onto cluster systems, that is, collections of non-empty subsets of the leaves, and triplet systems, that is, binary phylogenetic trees on just three leaves.Guided by the fact that these systems have been used for reconstructing phylogenetic networks (see e. g. [16] and [13] for recent overviews), we are particularly interested in finding bounds on the minimum size of a triplet system/cluster system required to "uniquely determine" a level-1 network.For trees this question is well understood.Specifically, for a phylogenetic tree T on n ≥ 3 leaves it is well-known that T is uniquely determined by its induced triplet system R(T ) (leading to an upper bound of n 3 for such a minimum-sized set) and that n − 2 carefully chosen triplets from R(T ) suffice to uniquely reconstruct T when T is binary (see Theorem 3 of [20] and its Corollary).For this case, it is also well-known that T is uniquely determined by its induced cluster system C(T ) and that for a minimum-sized cluster system to uniquely determine T , it must have |C(N )| = 2n − 1 elements.
As we shall see, the situation is more complicated for binary level-1 networks.Every level-1 network N induces a triplet system R(N ) and a certain cluster system S(N ) called the softwired cluster system of N (see [14] for background) but their ability to fully capture the topological structure of N is not as strong as one might hope.Let us say that a binary level-1 network N is encoded by its induced triplet system if for every binary level-1 network N ′ such that R(N ′ ) = R(N ), we have N = N ′ .Continuing, we say that a binary level-1 network is 4-outwards if its underlying graph does not have a cycle of length four or less.It is precisely the 4-outwards binary level-1 networks N that are encoded by R(N ) as well as S(N ) [8] (where we define a binary level-1 network to be encoded by its induced softwired cluster system in an analogous way).
Intriguingly, if R(N ′ ) = R(N ) is replaced by R(N ) ⊆ R(N ′ ) (as is the case in our formalization of "uniquely determining") then the assumption that N is 4-outwards is no longer strong enough to guarantee uniqueness.A similar observation holds for S(N ) (see Sections 6 and 7 for examples for both cases).However, the situation changes for both if, in addition to being 4-outwards we require that N is saturated, that is, none of its vertices is incident with more than one cut arc (Theorem 6.3 and Theorem 7.3).Simple networks on n ≥ 4 leaves are 4-outwards, saturated networks that have precisely one cycle in their underlying graph.We show that at most 2n − 1 carefully chosen triplets suffice to uniquely determine such networks.As the network on four leaves depicted in Fig. 6 indicates, this bound is however not tight because five triplets suffice in that case (which can be checked by a simple case analysis).Given that any binary level-1 network N contains at least one triplet for any three of its leaves and so |R(N )| ≥ n 3 holds, this suggests that at least for simple phylogenetic networks there is a considerable amount of redundancy in R(N ) with regards to reconstructing N from R(N ).To establish a similar result for general binary level-1 networks N might not be straightforward in view of Proposition 4.2, which suggests that |R(N )| is not easily expressible in terms of a natural parameter associated with a phylogenetic network N , namely its number of non-trivial cut arcs (see Section 3).This is somewhat surprising in view of the close relationship between the triplet system induced by a binary level-1 network N and its associated softwired cluster system S(N ) (see e. g. [8, Proposition 2 and Theorem 1] for details concerning this relationship) because the size of S(N ) is closely related to the number of cut arcs of N (Theorem 4.1).As in the case of triplet systems, it is easy to find examples of binary level-1 networks N that indicate that there is redundancy in the softwired cluster system induced by N with regards to uniquely determining N .Again focusing on simple networks N , we show that at most n carefully chosen (softwired) clusters induced by N suffice to uniquely determine N (Corollary 7.2).However, we do not know if this bound is sharp.
Given that in phylogenetic analyses one is hardly ever guaranteed to have all triplets/clusters induced by a (as yet unknown) phylogenetic network available, the above observations have profound consequences for phylogenetic network reconstruction.One of the most important ones is that a phylogenetic network reconstructed from a triplet or cluster system need not be the network that gave rise to this system.
The paper is organized as follows.In the next section, we present basic terminology of relevance to this paper, including the definition of a level-k network and that of a gall in a level-1 network.In Section 3, we define cut arcs and present formulas for counting the number of vertices, arcs, and galls in a binary level-1 network.These results improve on the results in [6] which imply that the number of vertices in a binary level-1 network on n leaves is linear in n and that the number of hybrid vertices is at most n − 1.In Section 4.1, we formally define the softwired cluster system S(N ) induced by a binary level-1 network N and establish Theorem 4.1.In Section 4.2, we define the triplet system R(N ) induced by a binary level-1 network N and establish Proposition 4.2.In Section 5, we establish in Proposition 5.1 a relationship between the triplet system induced by a binary level-1 network N and a certain partition of the leaf set of N that will be crucial for showing Theorem 6.3.In Section 6, we first formalize the notion of "uniquely determining" and then present the aforementioned examples for triplet systems.Starting in that section and continuing in Section 7, we investigate saturated, 4-outwards, binary level-1 networks and establish Theorem 6.3 and Theorem 7.3, respectively.

Definitions and Notation
In this section we present only basic definitions and notation to avoid overloading the reader.Concepts such as triplets and (softwired) clusters are formalized in subsequent sections.
Throughout the paper, let X denote a finite set of size n ≥ 2. Also all graphs G considered have non-empty finite sets of vertices and edges (or arcs in case G is directed) and have no loops or multiple edges (or arcs in case G is directed).
Suppose for the following that G = (V, A) is a directed acyclic graph (DAG).If v and w are vertices of G such that there exists an arc a from v to w in G then we denote that arc by (v, w) and refer to v as the tail of a, denoted by tail(a), and to w as the head of a, denoted by head(a).Suppose v ∈ V is a vertex of G. Then we denote by outdeg(v) the out-degree of v (i.e. the number of arcs whose tail is incident to v) and by indeg(v) the in-degree of v (the number of arcs whose head is incident to v).The sum of the out-degree and the in-degree of v is called the degree of v, denoted by deg ) is a further rooted DAG with leaf set X then we say that G is equivalent to G ′ if there exists a graph isomorphism from G to G ′ that is the identity on X.
A phylogenetic network N on X is a rooted DAG whose set of leaves is X, and every interior vertex v of N except the root ρ N is either (i) a split vertex of N , that is, indeg(v) = 1 and outdeg(v) ≥ 2 or (ii) a hybrid vertex of N , that is, indeg(v) ≥ 2 and outdeg(v) ≥ 1.In case only the size of X is of relevance to the discussion then we will simply call N a phylogenetic network on |X| leaves and if the set X is of no relevance to the discussion then we will simply call a phylogenetic network N on X a phylogenetic network.We denote the set of hybrid vertices of a phylogenetic network N by H(N ) and say that N is binary if the root of N as well as every split vertex of N has out-degree two and every hybrid vertex of N has out-degree one and in-degree 2. An (We say that a biconnected component is non-trivial if it contains more than one edge).Let U (N ) be the underlying graph of N i.e. the undirected graph obtained from N by ignoring the orientation of its arcs.We say that a binary phylogenetic network N is a level-k (phylogenetic) network, if every biconnected component of U (N ) contains at most k hybrid vertices.Reflecting the fact that a cycle of length three in the underlying graph of a phylogenetic network is indistinguishable (from a triplet or cluster perspective) from a split vertex, we follow common practice and will always assume that a cycle in the underlying graph of a level-1 network N contains at least four vertices.
Note that a phylogenetic network N for which H(N ) = ∅ holds is simply a rooted phylogenetic tree on X (sensu [18]).Thus, level-0 networks are rooted phylogenetic trees.All phylogenetic trees considered in this article are rooted so we henceforth drop the "rooted" prefix.
We denote the class of all binary level-1 networks on n ≥ 2 leaves by L 1 (n).Alternatively, we will also use L 1 (X) to denote that class if we want to emphasize the leaf set X of the networks in L 1 (n).Now, suppose that N is a level-k network, k ≥ 1.Then we call N proper if N is not also a level-l network for some 0 ≤ l ≤ k − 1.Note that in case k = 1 such a network must have at least three leaves and at least one hybridization vertex.In that case, we call a non-trivial biconnected component of U (N ) with its original directions in N restored a gall of N and denote the set of galls of a level-1 network N by G(N ).If N is binary, contains precisely one gall C, and every leaf of N is adjacent with a vertex of C then N is called simple.Together with phylogenetic trees, such networks may be viewed as the building blocks of (proper) level-1 networks [22].For the convenience of the reader, we present examples of two simple level-1 networks on X = {x 1 , . . ., x 5 } in Fig. 2.

Counting arcs, vertices and galls
In this section, we present some enumerative results concerning the number of vertices, arcs, and galls of a level-1 network.We start by introducing some relevant notation.Suppose N is a phylogenetic network on X.Following [26], we say that a phylogenetic tree T on X is displayed by N if there exists a subgraph N ′ of N that is a subdivision of T i.e.T can be obtained from N ′ by repeatedly suppressing vertices with in-degree and out-degree both equal to 1.For N a level-1 network, we denote the number of galls of N by g(N ), that is, we let g(N ) = |G(N )|.

Counting arcs and vertices.
In case N is a binary level-0 network on n ≥ 2 leaves, that is, N is a binary phylogenetic tree on n leaves, it is easy to see that N has 2n − 1 vertices and 2(n − 1) edges (see e. g. [18,Proposition 2.1.3]for the corresponding result for unrooted binary phylogenetic trees).For the more general case that N is a binary, proper, level-k network on n ≥ 2 (and thus on n ≥ 3) leaves, and k ≥ 1, it was shown in [21,Lemma 4.5] that any such network can contain at most 2n − 1 + k(n − 1) vertices and at most 2n − 2+ 3  2 k(n − 1) arcs.Denoting for n ≥ 3 the subclass of all proper level-1 networks in L 1 (n) by L 1 (n) − , the sizes of the vertex and arc sets of a network N = (V, A) in ∈ L 1 (n) − can thus be at most 3n − 2 and 3.5(n − 1), respectively.Moreover, if follows from [21,Lemma 4.4] that |V | = 2n + 1 = |A| holds in the special case that N is simple.The next result indicates that the size of the vertex set of a simple level-1 network lends itself to providing lower bounds on the sizes of the vertex set and arc set of a general proper level-1 network, respectively.
. These bounds are tight if n = 3, in which case N must be a simple level-1 network.
Proof.Suppose X has size n and assume that N = (V, A) is a network in L 1 (n) − with leaf set X.The upper bounds have already been established in the above discussion, so it suffices to prove that 2n + 1 ≤ |V | and 2n + 1 ≤ |A|.Let g = g(N ) and note that g ≥ 1 holds because N is assumed to be proper.We start by adding a new taxon y ∈ X just above the root of N , in the following way: introduce a new vertex u, add an arc from u to the root of N , and add an arc from u to y.Let N y be this new network, whose root is u.N y has |V | + 2 vertices and |A| + 2 arcs.Let T = (V T , A T ) be any binary phylogenetic tree on X ∪ {y} that is displayed by N y .Then there exists a subgraph T ′ of N y that is a subdivision of T .Observe that T ′ must contain 2g vertices whose in-degree and out-degree (in T ′ ) are both equal to 1. Specifically, g of them are hybrid vertices of N y and the other g are tails of arcs in N y whose head is contained in H(N y ).(The correctness of these claims requires the root of T ′ to be the same vertex as the root of N y , and this is the reason for the addition of y in the first place.)Consequently, where the last inequality follows because g ≥ 1.Similarly, to obtain T ′ from N y , we need to delete for every hybrid vertex h ∈ H(N y ) precisely one of its incoming arcs (v, h).Hence, both v and h will have in-degree and out-degree 1 in T ′ .(Note that v might be the root of the gall that contains h in N y ).Hence, |A T | = (|A| + 2) − g − 2g.Noting (again, because it is binary) that T has 2(n + 1) − 2 arcs, we have where, as before, the last inequality follows because g ≥ 1.
It can easily be verified that the bounds are tight in the case n = 3.Specifically, all expressions evaluate to 7. Finally, if n = 3 then N must be a simple level-1 network, because otherwise it would either be a tree (and thus not proper) or violate the assumption that every cycle in the underlying graph of N contains at least four vertices.

Counting galls.
We next establish a formula for counting the number of galls of a level-1 network.To this end, we require further terminology.Suppose is a leaf of G then we call a a trivial cut arc of G and a non-trivial cut arc of G otherwise.We denote the number of non-trivial cut arcs of a level-1 network N by c N .
Suppose N is a level-1 network on X.For a gall C of N , we call an arc of N whose tail but not its head is a vertex of C an outgoing arc of C. Note that our assumption that every cycle in U (N ) has at least four vertices implies that every gall must have at least three outgoing arcs.Moreover, if N is binary then we call two distinct leaves x and y of N a cherry of N if x and y have a common parent.Note that that parent must be a split vertex of N .In addition, if N is a binary phylogenetic tree and |X| = 3 then N is called a triplet (on X).Saying that a vertex v of a rooted DAG G is below a vertex w of G if w lies on a directed path from the root of G to v but is distinct from v, we denote a triplet t on X = {a, b, c} for which the last common ancestor of a and b is below the root of t by ab|c (or equivalently c|ab).Finally, a collection R of triplets is called a triplet system (on t∈R L(t)).Theorem 3.2.Let n ≥ 2 and suppose N ∈ L 1 (n).Then g(N ) ≤ n − c N − 2 and this bound is tight if either N is a phylogenetic tree or every gall of N has exactly three outgoing arcs.
Proof.We prove the theorem by induction on n ≥ 2. Suppose N ∈ L 1 (n).Then the stated inequality clearly holds in the form of an equality for n = 2 since in that case N is a phylogenetic tree.It also holds for n = 3 because in that case N is either a triplet and so has one non-trivial cut arc but no gall, or N is a simple level-1 network and so has precisely one gall but no non-trivial cut arcs.
Suppose that N has n ≥ 4 leaves and assume that g in case N is a phylogenetic tree as in that case g(N ) = 0 and every non-trivial cut arc of N is an interior edge of N , of which there are n − 2. So assume that N ∈ L 1 (n) − .To see the stated bound for g(N ), we distinguish between the cases that (i) N contains a gall C whose outgoing arcs are all trivial cut arcs and (ii) that this is not the case, that is, N contains a cherry.
Assume first that Case (i) holds.We distinguish the cases that C has three outgoing arcs and that C has at least four outgoing arcs.Assume first that C has at least four outgoing arcs.Then there must exist a leaf a of N that is the head of an outgoing arc of C but is not adjacent with the unique hybrid vertex of C. Consider the rooted DAG N ′ obtained from N by first removing a, its parent a ′ ∈ V (N ), and all arcs adjacent with a ′ and then adding a new arc from the parent of a ′ to the child of a ′ contained in V (C).Clearly, N ′ is a binary level-1 network on L(N ) \ {a} and g(N ) = g(N ′ ) and Next, assume that C has exactly three outgoing arcs a 1 , a 2 , a 3 .Let N ′ be the rooted DAG obtained from N by contracting C as well as a 1 , a 2 , and a 3 into a new leaf x.Clearly, N ′ is a binary level-1 network on L(N ) ∪ {x} \ {head(a 1 ), head(a 2 ), head(a 3 )} and g( Last but not least, assume that Case (ii) holds, that is, N contains two leaves x and y that form a cherry.Let N ′ denote the rooted DAG obtained from N by first deleting x, its parent p, and all arcs incident with p and then adding a new arc from the parent of p to y.
by the induction hypothesis.This concludes the proof of this case and thus the proof of the stated bound for g(N ).
It remains to establish that the stated bound for g(N ) is tight for a level-1 network N ∈ L 1 (n) for which all of its galls have precisely three outgoing arcs.To see this, one can again perform induction on n ≥ 2 but this time assuming that g(N ′ ) = n − c N ′ − 2 holds for all level-1 networks N ′ ∈ L 1 (n − 1) for which every gall has precisely three outgoing arcs.In this context it should be noted that the cases n ∈ {2, 3} and N is a phylogenetic tree on n leaves have already been observed above.We leave the details to the interested reader.

Counting clusters and triplets
In this section we establish enumerative results for computing the sizes of the so-called hardwired and softwired cluster system, respectively, that have both been introduced in the literature for phylogenetic network reconstruction [13].In addition, we establish that the corresponding result for triplets does not hold.We start with clusters.4.1.Counting clusters.We call a non-empty subset of X a cluster and refer to a set of clusters of X as a cluster system on X, or just a cluster system if the set X is clear from the context.Suppose for the following that N is a phylogenetic network on X and that v ∈ V (N ).Then we define the cluster C N (v) associated with v to comprise of all leaves of N that are below v and let C N (v) = {v} in case v is a leaf of N .Again, we simplify our notation by writing C(v) rather than C N (v) if N is clear from the context.Note that C(ρ N ) = X.Then the hardwired cluster system C(N ) associated with N is the cluster system {C(v where n denotes |X| in both cases.Denoting by T (N ) the set of phylogenetic trees on X displayed by N , the softwired cluster system S(N ) associated with N is defined as S(N ) = T ∈T (N ) C(T ).
It is not too difficult to argue that S(N ) contains O(n) clusters.To see this, let T be a tree on X displayed by N and let v be a vertex of T .From the definition of display it follows that a subdivision of T can be topologically embedded within N .Fix such an embedding, and let T ′ and v ′ be the images of T and v in N .If v ′ is the head of a cut arc in N , or the root of N , then C T (v) will be equal to C N (v ′ ), irrespective of the exact embedding.If v is not the head of a cut arc, nor the root, then it is a vertex of some gall of N .In that case, there are only (at most) two possibilities for C T (v).Specifically, the choice of cluster is completely determined by which of the two edges incoming to the hybridization vertex in the gall, are in T ′ (irrespective of the exact topology of the embedding).Now, from Lemma 3.1 N contains O(n) vertices.Given that (as argued) each vertex can contribute at most two clusters, it follows that S(N ) contains O(n) clusters.The next result improves on this O(n) observation by providing a formula for the size of the closely related cluster system S(N ) − := S(N )\{X}.
(This is also an improvement on the result presented in [8, Proposition 3].)To establish it, we require further terminology.
Suppose N ∈ L 1 (X) and X ′ ⊆ X.Then we define the restriction N | X ′ of N to X ′ to be the network in L 1 (X ′ ) obtained from N by deleting all vertices in X − X ′ and then applying the following "cleaning up" operations in any order until no more can be applied 1 : (i) suppressing vertices with in-degree and out-degree both equal to one; (ii) deleting vertices with out-degree zero that are not an element in X; (iii) collapsing multi-arcs into a single arc; (iv) if a gall G has been created that has exactly two outgoing cut arcs (u, v), (u ′ , v ′ ), then deleting these two cut arcs and all the arcs of G and adding arcs (r, v) and (r, v ′ ), where r is the unique vertex of G whose children are u and u ′ ; (v) deleting vertices with in-degree zero and out-degree one.(Note that if N is a phylogenetic tree this definition specializes to the usual definition of "restriction" used in the tree literature.)We often write N | X−x as shorthand for N | X−{x} .
Proof.We prove the theorem by induction on n ≥ 2. Suppose N ∈ L 1 (n).Then the stated equality holds if n = 2 as then N is a phylogenetic tree on two leaves and if n = 3 because in that case N is either simple and so c N = 0 holds or N is a triplet.In the former case, |T (N )| = 2 and both phylogenetic trees contained in T (N ) are triplets.Thus, |S(N ) − | = 5 = 3n−4−c N holds in this case.In the latter case, c N = 1 follows and thus |S(N Now suppose n > 3 and assume that the theorem holds for all networks N ′ with at most n − 1 leaves.Let X = L(N ).We distinguish between the cases that every cut arc of N is trivial and the case that N contains at least one non-trivial cut arc.
Suppose first that every cut arc of N is trivial.Then c N = 0 and N is simple.Note that since n > 3, at least one of the two directed paths from the root ρ N to the hybrid vertex h N of N must contain at least two vertices distinct from ρ N and h N .Let P 1 denote such a path.Moreover, let v ∈ V (P 1 ) denote the vertex on P 1 that is adjacent with ρ N and let x ∈ X denote the leaf of N that is adjacent with v. Let X ′ = X − {x} and N ′ = N | X ′ .Clearly N ′ ∈ L 1 (n − 1) and c N ′ = c N = 0. Thus, |S(N ′ ) − | = 3(n − 1) − 4, by the induction hypothesis.Observe that the definition of S(N ) implies that S(N ) − contains exactly three clusters that S(N ′ ) − does not.Indeed, in case the other directed path from ρ N to h N also contains vertices distinct from ρ N and h N , the three clusters missing from S(N ′ ) − are {x}, C N (v) \ {h} and C N (v), where h is the leaf below h N .Otherwise, the three clusters missing from S(N ′ ) − are {x}, C N (v) \ {h} and C N (v ′ ) where v ′ is the child of v that is not contained in X.Consequently, |S(N Consider the rooted DAG N 1 with leaf set Y 1 obtained from N by deleting all vertices (plus their incident arcs) that are not below v and the rooted DAG N 2 on Y 2 ∪ {v} obtained from N by deleting all vertices below v (plus their incident arcs).Since In this paper only a subset of these "cleaning up" operations will be required.However, we list them all to retain consistency with the wider literature.4.2.Counting triplets.In view of the close relationship between the cluster system C(T ) induced by a phylogenetic tree T on at least three leaves and the triplet system R(T ) induced by T (see e. g. [7] or [24]) it is reasonable to hope that the companion result to Theorem 4.1 might hold for the triplet system R(N ) induced by a phylogenetic network N on at least three leaves.Put differently, it should be possible to express the size of R(N ) in terms of the number of galls and non-trivial cut arcs of N .As the next result shows, this is not the case.We start with defining the triplet system R(N ).
Suppose N ∈ L 1 (X), where |X| ≥ 3 and a, b, and c are distinct elements in X.Then the triplet ab|c is said to be consistent with N if there exist distinct vertices v and w in N and directed paths in N from v to c and w, respectively, and from w to a and b, respectively, such that any pair of those paths does not have an interior vertex in common.The triplet system R(N ) is then the set of all triplets t with L(t) ⊆ X that are consistent with N .
To illustrate this definition consider the simple level-1 network N 2 on X = {x 1 , . . ., x 5 } depicted in Fig. 2(ii).Then R(N 2 ) comprises the sixteen triplets Proof.Let X ′ denote a finite set of size at least two and let a, b, c, and d denote pairwise distinct elements not contained in X ′ .Consider the binary level-1 networks N 1 and N 2 on X := X ′ ∪ {a, b, c, d} depicted in Fig. 4, where the triangle marked T denotes some binary phylogenetic tree on X ′ .
As can be easily checked, N 1 and N 2 have the same number of leaves and both contain one gall and have c T + 3 non-trivial cut arcs.Moreover, for any 3-set Y ∈ X 3 , there exists exactly one triplet on Y that is contained in R(N 1 ) except for Y = {a, b, c} for which a|bc, c|ab ∈ R(N 1 ) holds.Hence, |R(N 1 )| = n 3 + 1, where n = |X|.Similarly for every 3-subset Y ∈ X 3 , there exists exactly one triplet on

Triplet systems and the partition Cut(N )
In this section, we start turning our attention to the question of how many triplets suffice to uniquely determine a binary level-1 network.Central to our arguments will be a special type of subsets of X called SN-sets which were originally introduced in [15] and further studied in, for example, [22,23].
Suppose |X| ≥ 3 and R is a triplet system on X.Then a subset S ⊆ X is called an SN-set of R if there is no triplet xy|z ∈ R with x, z ∈ S and y ∈ S. In addition, such a set S is called non-trivial if S = X. 2 Last but not least, a non-trivial SN-set S for R is called maximal if there is no non-trivial SN-set that is a strict superset of S.
As it turns out, for a binary network N (of any level) the SN-sets of R(N ) are closely related to the cut arcs of N in the sense that if (u, v) is a cut arc of N , then C N (v) is an SN-set of R(N ) because there cannot exist a triplet xy|z ∈ R(N ) such that x, z ∈ C N (v) and y ∈ C N (v).We call a cut arc (u, v) of N highest if there does not exist a cut arc (u ′ , v ′ ) of N such that there is a directed path from v ′ to u.We denote by Cut(N ) the partition of X induced by, for each highest cut arc (u, v) of N , taking the cluster C N (v) of X.By [23, Observation 3], Cut(N ) is exactly the set of maximal SN-sets of R(N ).
We begin with an auxiliary observation which relies on the concept of compatibility of pairs of sets, whereby two non-empty finite sets A and B are called compatible if A ∩ B ∈ {∅, A, B} holds and incompatible otherwise.More generally, a cluster system C is called compatible if any two clusters in C are compatible and incompatible otherwise (see e. g. [18, Section 3.5] and [7] for more on such objects).
Observation 1. Suppose that n ≥ 3 and that N and ) denote two split vertices that are heads of cut arcs of N and N ′ , respectively.Then the induced clusters C N (v) and Then, out of the three possible triplets with leaf set {x, y, z}, only the triplet xy|z can be contained in R(N ).Hence, xy|z ∈ R(N ′ ) and, so, C ′ cannot be an SN-set of R(N ′ ); a contradiction as the incoming arc of v ′ is a cut arc of N ′ and, so, C ′ must be an SN-set of R(N ′ ).Thus, C and C ′ must be compatible.
To see the remainder of the observation, assume that C C ′ .Then since R(N ) ⊆ R(N ′ ) and C ′ is an SN-set of R(N ′ ), it follows that C ′ is also an SN-set of R(N ).Since C is also an SN-set of R(N ), it cannot be a maximal SN-set of R(N ).
The next result will be required by the induction argument that we will use in the proof of Theorem 6.3.The proof of the proposition relies on the facts that for any saturated network N ∈ L 1 (X) (i) the partition Cut(N ) contains at least three elements and (ii) there exists a gall B of N such that the root of N is a vertex of B.
Figure 5.The structure of networks (i) N and (ii) N ′ considered in the proof of Proposition 5.1.
Proof.The proof contains multiple parts so we first describe it at a high level, and then give details.The entire proof is devoted to proving that Cut(N ) ⊆ Cut(N ′ ), and after this Cut(N ) = Cut(N ′ ) follows immediately from the fact that Cut(N ) and Cut(N ′ ) are both partitions of X.The proof that Cut(N ) ⊆ Cut(N ′ ) holds is a long proof by contradiction, which starts with the assumption that there exists some C ∈ Cut(N ) − Cut(N ′ ).We then show that |C| ≥ 2 must hold.Combined with the assumption that N is saturated we then infer that, up to symmetry, the structure of N is as indicated in Fig. 5(i).Choosing elements x, p, q ∈ X as described below we obtain that R(N ) must contain two distinct triplets t 1 and t 2 with leaf set {x, p, q}.By examining the structure of Cut(N ′ ) we identify that at least one of two cases, referred to as Cases (i) and (ii) below, must hold.However, we show that Case (i) cannot hold, and thus conclude that Case (ii) must hold.We then show that, up to symmetry, the structure of N ′ is as indicated in Fig. 5(ii).We argue that x, p, q are below three distinct highest cut arcs of N ′ , and that none of these are the cut arc incident to the hybridization vertex of B ′ (where B ′ is the topmost gall of N ′ , as shown in Fig. 5(ii)).This implies that R(N ′ ) cannot contain both t 1 and t 2 which finally yields the required contradiction.
Let us then start by assuming, for the sake of contradiction, that there exists some Proof that |C| ≥ 2: Since both Cut(N ) and Cut(N ′ ) are partitions of X, there exists some recalled above, C is a maximal SN-set of R(N ), and, by Observation 1, C and C ′ are compatible, it follows that C ′ C. Thus, there exists a further element The structure of N : Let r ∈ V (N ) denote the head of the cut arc (r ′ , r) of N for which C = C N (r) holds and let B r denote the gall of N that contains r in its underlying cycle (which exists because |C| ≥ 2 and N is saturated).In view of the usual assumption that no gall in a phylogenetic network has two or fewer outgoing cut arcs, B r has at least three outgoing cut arcs c 1 , c 2 and c 3 (see Fig. 5(i)).Let c 1 denote the outgoing cut arc of B r whose tail is the hybrid vertex Clearly, R(N ) contains two distinct triplets t and t ′ on {x, y, z}.
The structure of N ′ : Since R(N ) ⊆ R(N ′ ) we also have t, t ′ ∈ R(N ′ ).Since Cut(N ′ ) is the partition of X induced by the maximal SN-sets of R(N ′ ), it follows that either (i) there exists some element A ∈ Cut(N ′ ) such that x, y, z ∈ A or (ii) there exist distinct elements C x , C y , C z ∈ Cut(N ′ ) such that a ∈ C a , for all a ∈ {x, y, z}.
Assume first that Case (i) holds.We claim that C ⊆ A. To see this, note that we were free to choose any two cut arcs c 2 and c 3 distinct from c 1 and subsequently we had a free choice of z, x, y.For any Z := {x, y, z} chosen this way -let us call this a valid choice -it is straightforward to see that there exist two triplets on Z in R(N ) and thus in R(N ′ ).Since A is an SN-set of R(N ′ ) it follows that as soon as two of the three leaves of a triplet on Z are contained in A, so too is the third.Now, let {x, y, z} be our initial valid choice, so by assumption {x, y, z} ⊆ A. Simple case analysis shows that for any element p ∈ C, at least one of {x, y, p}, {x, p, z} or {p, y, z} is a valid choice.Hence, p ∈ A which proves the claim.Since C ∈ Cut(N ′ ) we have in fact C = A.But C A cannot hold either because C is a maximal SN-set for R(N ) and A is a maximal SN-set for R(N ′ ), and by Observation 1 this cannot happen.Thus, Case (ii) must hold (see Fig. 5(ii)).
The triplets t 1 and t 2 : Let h ∈ V (N ) denote the hybrid vertex of the topmost gall K of N , that is, the gall of N that contains the root of N in its vertex set (which must exist because N is saturated).Also note that because C ∈ Cut(N ) it follows that (r ′ , r) is a highest cut arc of N and thus r ′ is a vertex of K. Since |Cut(N )| ≥ 3 there exist distinct elements Choose some p ∈ C 1 and some q ∈ C 2 .Combined with the definition of R(N ) it follows that R(N ) must contain two triplets t 1 and t 2 on {x, p, q}, two triplets on {y, p, q}, and two triplets on {z, p, q}.Note that since R(N ) ⊆ R(N ′ ), those six triplets are also contained in R(N ′ ).(In the next part of the proof we assume C N ′ (h ′ ) = C z , where these terms will be defined in due course, and the critical point here is that C N ′ (h ′ ) = C x .This is genuinely without loss of generality because when selecting t 1 and t 2 in the present part of the proof it does not matter whether they are on {x, p, q}, {y, p, q} or {z, p, q}.) The taxa x, p, q are all beneath distinct highest cut arcs of N ′ , but none of these are incident to the hybridization vertex: Using x, y, z, p and q, we next analyze the structure of Cut(N ′ ) (see Fig. 5(ii)).Observe first that since |Cut(N ′ )| ≥ 3, the root of N ′ must be contained in a gall B ′ of N ′ .Let h ′ ∈ V (N ′ ) denote the unique hybrid vertex of B ′ .Let C p , C q ∈ Cut(N ′ ) be such that p ∈ C p and q ∈ C q .
We claim that C p and C q are distinct elements in Cut(N ′ ) − {C x , C y , C z }.To see this, note first that, since the sets C x , C y and C z are pairwise distinct and t, t ′ ∈ R(N ′ ), it follows that one of x, y, and z must be contained in C N ′ (h ′ ).Without loss of generality, assume that z ∈ C N ′ (h ′ ) = C z .Note next that C p = C q .Indeed, at least two elements of {x, y, z} are not contained in C p , because C x , C y and C z are distinct.Suppose, without loss of generality, x ∈ C p .If C p = C q , then only the triplet pq|x will be contained in R(N ′ ), contradicting the fact that t 1 and t 2 are distinct triplets on {x, p, q} contained in R(N ′ ).It remains to show that C p , C q ∈ {C x , C y , C z }.Assume for the sake of contradiction that p ∈ C x .Then only xp|q is in R(N ′ ), because q ∈ C x , contradicting the fact that both t 1 and t 2 are in R(N ′ ).Similarly, if p is in C y , then at most one of the two triplets on {y, p, q} are in R(N ′ ), and if p is in C z , at most one of the two triplets on {z, p, q} are in R(N ′ ).So C p ∈ {C x , C y , C z }.By a symmetrical argument, C q ∈ {C x , C y , C z }.This proves the claim.
By the previous claim, neither p nor q is in C z .Since x ∈ C z it follows that t 1 and t 2 cannot both be contained in R(N ′ ) which gives the final contradiction.Thus, Cut(N ) ⊆ Cut(N ′ ), as required.
Since both Cut(N ) and Cut(N ′ ) are partitions of X, it follows that Cut(N ) = Cut(N ′ ).

Triplet systems that L 1 (X)-define
As is well-known, every binary phylogenetic tree T on X is defined by the triplet set R(T ) induced by T , where a a phylogenetic tree S on X is said to be defined by a triplet system R on X, if, up to equivalence, S is the unique phylogenetic tree on X for which R ⊆ R(S) holds (see e. g. [18]).In this context it is important to note that this uniqueness only holds within the space of phylogenetic trees because all networks N ∈ L 1 (X) that display T have the property that R(T ) ⊆ R(N ).Combined with the fact that the network N pictured in Fig. 6(i) is, up to equivalence, the only binary level-1 network on X = {x 1 , . . ., x 4 } that is consistent with all five triplets depicted in Fig. 6(ii) -a simple case analysis can be applied to verify this -and |R(N )| = 7, it is natural to ask how many triplets suffice to "uniquely determine" a level-1 network.In this section we provide a partial answer to this question.More precisely, saying that a network N ∈ L 1 (X) is L 1 (X)-defined by a triplet system R (on X) if, up to equivalence, N is the unique network in L 1 (X) such that R ⊆ R(N ) holds, we show that every 4-outwards network N in L 1 (X) that is also simple is L 1 (X)-defined by a triplet system of size at most 2|X| − 1.In addition, we show that if the requirement that Since with ) and so R 1 ⊆ R 2 holds it follows that Collapse(N ) and Collapse(N ′ ) must be equivalent.
We next analyze the level-1 networks N v of N with v ∈ X * .Let v ∈ X * and let C ∈ Cut(N ) be such that v ∈ C. Note that if |C| = 1 then N v is an isolated vertex and thus a rooted DAG with leaf set {v}.So assume that |C| ≥ 2. Then since N is a saturated, 4-outwards network in L 1 (X), N v is a saturated, 4-outwards network in L(C).Since N v has at most g − 1 galls the induction hypothesis implies that N v is L(C)-defined by R(N v ).By assumption, R(N ) ⊆ R(N ′ ) and so R(N v ) ⊆ R(N ′ v ).Thus N ′ v and N v must be equivalent.Combined with the observation that the networks Collapse(N ) and Collapse(N ′ ) are equivalent it follows that N and N ′ are equivalent.

L 1 (X)-defining cluster systems
In this section, we turn our attention to the companion question of Section 6.That is, whether some (not necessarily proper) subset of S(N ) is sufficient to "uniquely determine" a 4-outwards network N in L 1 (X).We first present a formalization of the idea of "uniquely determining" to being L 1 (X)-defined for cluster systems.Subsequent to this, we then show that all 4-outwards networks N ∈ L 1 (X) that are also simple are L 1 (X)-defined by a cluster system of size at most |X| (Theorem 7.1 and Corollary 7.2).Replacing the requirement that N is simple by the more general requirement that N is saturated, we also show that such networks are L 1 (X)-defined by their induced softwired cluster system (Theorem 7.3).
Let N denote a phylogenetic network on X and let S denote a cluster system on X.Then we say that N displays S (in the softwired sense) if S ⊆ S(N ) holds.Furthermore, we say that a network N ∈ L 1 (X) is L 1 (X)-defined by a cluster system S on X if, up to equivalence, N is the unique network in L 1 (X) that displays S. It should be noted that, as in the case of triplet systems, a binary phylogenetic tree T on X is not L 1 (X)-defined by its induced cluster system C(T ) = S(T ).The reason is again that, by subdividing arcs of T and adding new arcs joining the subdivision vertices, we can transform T into a network N in L 1 (X) for which C(T ) ⊆ S(N ) holds.Also it should be noted that a network in L 1 (X) is not L 1 (X)-defined by its induced hardwired cluster system.Analogous to the triplet result presented in Section 6, a 4-outwards networks N ∈ L 1 (X) also need not be L 1 (X)-defined by S(N ).Indeed, consider again the two 4-outwards networks N 1 and N 2 on X = {x 1 , . . ., x 5 } presented in Figures 3(i) and 2(i), respectively.Then N 1 and N 2 are clearly not equivalent but S(N 1 ) ⊆ S(N 2 ).Theorem 7.1.Let X = {x 1 , . . ., x n }, n ≥ 4, and suppose that N is a simple network in L 1 (X) such that, when starting at the hybrid vertex v 1 of N and traversing the unique cycle C of U (N ) counter-clockwise, the obtained vertex ordering for and x j is a child of v j , for all 1 ≤ j ≤ i − 1, and x j is a child of v j+1 , for all i ≤ j ≤ n.Assume without loss of generality that i − 2 ≥ n − i + 1 i.e. that the right side of the gall contains at least as many leaves as the left side.Then N is L 1 (X)-defined by the cluster system S d (X) where Proof.Let N 1 ∈ L 1 (X) be a network such that S d (N ) ⊆ S(N 1 ).We first claim that N 1 must be simple.Assume for the sake of contradiction that N 1 is not simple, that is, N 1 contains a non-trivial cut arc (u, v).Then every cluster in S(N 1 ) must be compatible with C = C N 1 (v), 2 ≤ |C| < n, and C ∈ S(N 1 ).We will derive a contradiction by showing that S d (N ), and thus also S(N 1 ), contains at least one cluster that is incompatible with C. Case (i).We distinguish between the two alternatives that x 1 ∈ C and that hold either and so C and C ′ are incompatible, as required.Now, suppose x 1 ∈ C. Then since 2 ≤ |C| there exist p, q ∈ {2, . . ., n} with p < q, say, such that x p , x q ∈ C. Clearly, x q ∈ C ′ := {x 1 , . . ., x p } ∈ S d (N ).But then C ′ and C are again incompatible, as required.
A similar analysis holds for cases (ii) and (iii); we leave the details to the interested reader.Hence, N 1 must be simple, as claimed.
Let h denote the unique hybrid vertex of N 1 and let x denote the leaf of N 1 that is incident with h.For the remainder of the proof, we consider each of the three cases stated in the theorem separately.All three cases use the following observations: For ease of presentation we will liberally make use of the assumption that S d (N ) ⊆ S(N 1 ) without explicitly stating it.
Case (i).First, we argue that x ∈ {x 1 , x 2 }.Assume for the sake of contradiction that x ∈ {x 1 , x 2 }.Then C = {x 1 , x 2 } and C ′ = {x 2 , . . ., x n } \ {x} are incompatible and clearly contained in S d (N )| X−x ⊆ S(N 1 )| X−x .Hence, S(N 1 )| X−x is not compatible which is impossible because x is incident with h and so N 1 | X−x is a phylogenetic tree.So x ∈ {x 1 , x 2 }.In fact, similar reasoning implies that x = x 2 is also impossible as otherwise S d (N )| X−x would contain incompatible clusters {x 1 , x 3 } and {x 3 , . . ., x n }.So x = x 1 .Since {x 1 , x 2 } ⊆ S d (N ) it follows that the other child of the parent of x 2 in N 1 is h.Combined with the fact that 2≤j≤n−1 {{x 1 , x 2 , . . ., x j }} ⊆ S d (N ) it follows that the other child of the parent of and {x 1 , x n }, leading to a contradiction of the fact that N 1 | X−x is a phylogenetic tree.In fact, similar arguments utilizing the facts that {x 1 , x 2 , x 3 } ∈ S d (N ) and that n = 3 imply that x = x 2 and x = x n .So again x = x 1 .Since {x 1 , x 2 } and {x 1 , x n } are contained in S d (N ) ⊆ S(N 1 ) it follows that the other child of the parents of x 2 and x n in N 1 , respectively, is h.In view of {x 1 , x 2 , x 3 } ⊆ S d (N ) ⊆ S(N 1 ) we see that the other child of the parent of x 3 in N 1 must be the parent of x 2 .Since 3≤j≤n−1 {{x 2 , x 3 . . ., x j }} ⊆ S d (N ) similar arguments as in the previous case imply that N and N 1 must be equivalent.
Case (iii) Again the fact that N 1 | X−x is a phylogenetic tree implies that x ∈ {x 1 , x 2 , x n }.However, x = x n cannot hold because n − 1 = 2 and so {x 1 , x 2 } and {x 1 , x n−1 } are distinct clusters that are both contained in S d (N )| X−x and thus in S(N 1 )| X−x .But then S(N 1 )| X−x is incompatible which is impossible as N 1 | X−x is a phylogenetic tree.Similarly, x = x 2 as otherwise the two incompatible clusters {x 1 , x n } and {x n , x n−1 , . . ., x i } are contained in S d (N )| X−x .So x = x 1 .Focussing as in case (ii) on x 2 and x n we see again that the common child of their respective parents is h.Since 3≤j≤i−1 {{x 2 , x 3 . . ., x j }} ∪ n−1≥j≥i {{x n , x n−1 , . . ., x j }} ⊆ S d (N ) the location of the remaining leaves of N 1 is forced.Hence, N 1 is equivalent to N .
As an immediate consequence of Theorem 7.1, we obtain the companion result for Theorem 6.1.
Corollary 7.2.Every simple network in L 1 (X) with at least four leaves is L 1 (X)-defined by a cluster system of size at most |X|.
We now prove the cluster equivalent of Theorem 6.3 i. e. that requiring that a 4-outwards network in L 1 (X) is also saturated guarantees that it is uniquely determined by its induced softwired cluster system.Theorem 7.3.Every 4-outwards network in L 1 (X) that is also saturated is L 1 (X)-defined by its induced softwired cluster system.
Proof.Let N and N ′ be networks in L 1 (X) such that N is 4-outwards and saturated and S(N ) ⊆ S(N ′ ) holds.We need to show that N ′ is equivalent with N .Let T = T (N ).Clearly, T ∈T S(T ) = S(N ) ⊆ S(N ′ ).Combined with [24, Proposition 1] which implies that R(T ) ⊆ R(N ′ ) and the fact that R(N ) = R(T ) it follows that R(N ) ⊆ R(N ′ ).Since, by Theorem 6.3, N is L 1 (X)-defined by R(N ) it follows that N and N ′ are equivalent.
In fact, due to the very general character of [24, Proposition 1], Theorem 7.3 can easily be extended to prove that, whenever R(N ) has been proven sufficient to uniquely determine (in our sense) a specified subfamily -any subfamily -of phylogenetic networks N , so too is S(N ) where we canonically extend the notions of an induced triplet system and softwired cluster system to such networks.

Conclusions
In this paper, we have presented enumerative results concerning the number of vertices, arcs, and galls of a binary level-1 network.By focusing on triplet systems and (softwired) cluster systems we have also investigated the question if subsets of those systems suffice to uniquely determine the binary level-1 network that induced them.As part of this, we have presented examples that illustrate that a level-1 network need not be uniquely determined by the triplet/cluster system it induces, thus illustrating the difference between the notion of encoding and our formalization of uniquely determining.In addition, we have provided bounds on the size of such a system in case the network in question is simple and has at least four leaves.For the more general class of 4-outwards, saturated, binary level-1 networks we have shown that any network in that class is uniquely determined by the triplet/softwired cluster system it induces.However, a number of open questions remain.For example for which binary level-1 networks are the aforementioned bounds sharp and are 4-outwards saturated binary level-1 networks characterizable by the fact that they are uniquely determined by their induced triplet/softwired cluster system?
We conclude with remarking that in [11] trinets, that is, rooted directed acyclic graphs on just three leaves have recently been introduced in the literature for phylogenetic network reconstruction.In that paper it was also shown that any level-1 network is encoded by the trinet system that it induces.In addition, it was shown in [25] that the more general tree-sibling and level-2 networks are encoded by their induced trinet systems, a fact that is not shared in general for the triplet system or the softwired cluster system induced by such networks.Formalizing the idea of "uniquely determining" for trinet systems in a canonical way to L 1 (X)-defining trinet systems it might be interesting to explore what kind of trinet systems L 1 (X)-define such networks.

Figure 4 .
Figure 4. Two binary level-1 networks N 1 and N 2 on X ′ ∪ {a, b, c, d} for which the respective number of leaves, galls, and non-trivial cut arcs are the same yet |R(N 2 )| = |R(N 1 )| -see the proof of Proposition 4.2 for details.