Distinct-Cluster Tree-Child Phylogenetic Networks and Possible Uses to Study Polyploidy

As phylogenetic networks become more widely studied and the networks grow larger, it may be useful to “simplify” such networks into especially tractable networks. Recent results have found methods to simplify networks into normal networks. By definition, normal networks contain no redundant arcs. Nevertheless, there may be redundant arcs in networks where speciation events involving allopolyploidy occur. It is therefore desirable to find a different tractable class of networks that may contain redundant arcs. This paper proposes distinct-cluster tree-child networks as such a class, here abbreviated as DCTC networks. They are shown to have a number of useful properties, such as quadratic growth of the number of vertices with the number of leaves. A DCTC network is shown to be essentially a normal network to which some redundant arcs may have been added without losing the tree-child property. Every phylogenetic network can be simplified into a DCTC network depending only on the structure of the original network. There is always a CSD map from the original network to the resulting DCTC network. As a result, the simplified network can readily be interpreted via a “wired lift” in which the original network is redrawn with each arc represented in one of two ways.


Introduction
A (rooted) phylogenetic tree is a tree in which the vertices correspond to biological species, the leaves are extant species, and the branchings correspond to speciation events, usually by mutation. Recently there has been increased interest in speciation events such as hybridization and lateral gene transfer which are not modeled well using B Stephen J. Willson swillson@iastate.edu 1 Department of Mathematics, Iowa State University, Ames, IA 50011, USA trees (Delwiche and Palmer 1996;Doolittle and Bapteste 2007;Inagaki et al. 2002). Hence, there is interest in phylogenetic networks in which some nodes may have more than one parent. Overviews of phylogenetic networks may be found in Moret et al. (2004), Huson et al. (2010), and Steel (2016).
There are various interesting classes of networks that have been investigated. Treechild networks (Cardona et al. 2009) are those such that each vertex not a leaf has a child with in-degree one, called a tree-child. Normal networks (Willson 2010) are also of interest; they are tree-child with the additional property that they have no redundant arcs. A redundant arc, sometimes called a short-cut, is an arc (u, v) such that there is another directed path from u to v that does not include (u, v). More details are given in Sect. 2.
A vertex v is visible to a leaf x provided every path from the root to x contains the vertex v. The vertex v is visible if it is visible to some leaf (Francis et al. 2021). If v is visible to a leaf x, then the genome at v can have a strong direct influence on the genomic inheritance at x. An example will be seen in Fig. 6.
As phylogenetic networks grow more complicated, their interpretation also becomes more complicated. Tree-child networks are particularly useful because every vertex is visible (see Cardona et al. 2009;Huson et al. 2010, and Sect. 4). Nevertheless, general tree-child networks are awkward since for a given number n of leaves, the number of vertices can be unbounded (Cardona et al. 2009).
Recently there has been interest in "simplifying" a general phylogenetic network into a normal network. Normal networks are tree-child but more tractable because the number of vertices grow at most quadratically with n. Suppose N is a phylogenetic network. Francis et al. (2021) have proposed a "normalization" which in this paper I will denote FHS(N ). A fast procedure PhyloSketch (Huson and Steel 2020) is available to compute FHS(N ). The current author (Willson 2022) has proposed a different construction of a normal network denoted Norm(N ).
One mechanism of speciation is polyploidization (Marcussen et al. 2015;Jones et al. 2013) in which the new species arises with twice the chromosomes, containing the whole genome of two parents. Such doubling of chromosomes is a very strong biological signal. Figure 4 of Marcussen et al. (2015) proposes 21 such allopolyploidization events, leading to 21 reticulations in the network. Of these, four have one ingoing arc a redundant arc. Perhaps that fact is not surprising; the two parental species would probably be very closely related. Both Francis et al. (2021) and Willson (2022) apply their methods to the network in Marcussen et al. (2015) to find related normal networks. These normal networks contain many fewer reticulations. By definition, a normal network contains no redundant arcs. If such speciation events involving redundant arcs are common, then insisting on normal networks is throwing out a lot of biological signal.
Moreover, Degnan (2018) argues that "ghost lineages" involving unsampled or extinct taxa can lead to reticulations involving redundant arcs or even parallel arcs. Similarly, the discussion in Francis et al. (2021) gives an example where an extinct lineage yields a redundant arc when only extant taxa are at the leaves. Finally, some of the scenarios in Fig. 1 of Jones et al. (2013) show redundant arcs.
It therefore might be useful to biologists to simplify phylogenetic networks into a different class of networks, still tractable, but that may contain redundant arcs. This paper proposes such a class, here called DCTC networks. They are formally introduced in Sect. 4.
Roughly, DCTC networks are defined by two properties: (1) No two vertices have the same cluster (except possibly a leaf and its parent) so they have "distinct clusters," abbreviated DC.
Since they have both properties, they are called distinct-cluster tree-child networks, or, more briefly, DCTC networks.
We shall see that a DCTC network N has additional interesting properties which in many ways resemble those of normal networks. In particular, a DCTC network satisfies the following: (3) Suppose N has n leaves. Then the number of vertices of N is at most (n 2 +n+2)/2.
This upper bound is shown to be tight. (4) Every vertex is visible. (5) The number of hybrid vertices is at most n − 2.
(6) For every vertex v with out-degree at least 2, there exist two distinct leaves x, y such that v is the most recent common ancestor of x and y.
Property (4) is true of all tree-child networks (Cardona et al. 2009) hence for all normal and DCTC networks; Properties (5) and (6) are also true of normal networks by results in Steel (2016) and Willson (2010), respectively. The estimate in Property (3) is very similar to a result for normal networks (Willson 2010), for which (if no vertex has out-degree one) the number of vertices is at most (n 2 + n)/2. By contrast, if a TC network has n leaves, then the number of vertices can be unbounded. If a DC network has n leaves, then the number of vertices is bounded above by 2 n + n. It is interesting that a DCTC network with both conditions can have at most O(n 2 ) vertices.
The connection with normal networks is further studied in Sect. 6, where a DCTC network is shown essentially to be a normal network to which may have been added some redundant arcs while retaining the tree-child property. Moreover, in Sect. 8 we show that normal networks also can be easily modified into DCTC networks, although without redundant arcs.
In Willson (2012) and later (Willson 2022), the author studied CSD maps from one network to another. The topic will be reviewed in Sect. 3. Briefly, a CSD map ψ : N → N consists of a surjective map ψ : V (N ) → V (N ) on the vertex sets with interesting properties concerning the arcs. They often correspond to some kind of "simplification" of N . Suppose ψ : N → N is a CSD map. In that situation, the simplified network N can be visualized using a wired lift of N into N (initially defined in Willson (2012) and considerably extended in Willson (2022)). This wired lift redraws N including every arc, but each arc is drawn differently in one of a small number of ways so that N can be recognized in the modified drawing of N . Moreover, there is a path from ψ(u) to ψ (v) in N if and only if there is a g-path from u to v in the wired lift diagram (Willson 2022). (See Sect. 3 for more details.) Such a diagram can make it easier to interpret the simplification.
Here is a rough statement of the final result in this paper, in Sect. 9. Suppose N is a phylogenetic network for which the leaf set is identified with a set X . There is a procedure that systematically finds a DCTC network (here called DCTC(N )) for which there is a CSD map ψ : N → DCTC(N ). The network DCTC(N ) depends only on the structure of N . By Property (3), if N has n leaves, then DCTC(N ) has at most (n 2 + n + 2)/2 vertices, hence has bounded complexity. There is a wired lift of DCTC(N ) into N . The construction resembles that in Willson (2022) for normal networks. The objective is to find a network related to N with strong internal confirmation of features such as reticulations.
In Sect. 11, the procedure is applied to several examples, two with real data. In particular, Example 2 applies it to the network N of Marcussen et al. (2015), which studies allopolyploidization in Viola. The resulting network does in fact retain many more reticulations than were in the normalizations found in Francis et al. (2021) or Willson (2022), and it contains many redundant arcs.
A concluding discussion section treats biological interpretations of DCTC(N ).

Basic Notions
The properties assumed in this paper are the same as in Willson (2022), which may serve as a reference for more detail. Briefly, suppose X is a finite set (typically a set of extant species in the biological applications). An X -network N = (V , A, ρ, φ) is a finite acyclic directed graph (V , A) where V is a finite set of vertices and A is a finite set of arcs. There are no directed cycles, there are no loops (a, a), and there is at most one arc (a, b) for a = b. The in-degree of a vertex v in N , denoted indeg (v) or indeg(v; N ), is the number of arcs (u, v), i.e., the number of parents of v. The out-degree of a vertex v, denoted outdeg(v), is the number of arcs (v, u), i.e., the number of children of v. Here ρ is a vertex of in-degree 0, called the root; it is the only vertex with in-degree 0. A leaf is a vertex x ∈ V with out-degree 0. The map φ : X → V is a one-to-one map with image the set of leaves.
Occasionally we may have to deal with X -networks except that cycles are possible. If that occurs we will explicitly specify that the network is not necessarily acyclic.
Except where specified otherwise, we assume that each leaf φ(x) for x ∈ X is a vertex with in-degree 1 and hence has a unique parent, which is denoted p(x; N ) or p(x). The arc of form ( p(x), φ(x)) for some x ∈ X will be called the x-arc. If x is not specified, any such arc will be called an X -arc.
Note that we make no assumption about the network being binary. A vertex may have in-degree greater than 2 or out-degree greater than 2 or both.
A path in N from a to b is a sequence a = u 0 , u 1 , . . . , u k = b of vertices such that for 0 ≤ i < k, (u i , u i+1 ) ∈ A. Paths in N are thus directed. If there is a path from u to v, then we write u ≤ v, and ≤ is a partial order of V . For every vertex v, it is true that ρ ≤ v. We may write u < v to mean u ≤ v and u = v.
Suppose x and y are distinct vertices. A vertex u is a common ancestor of x and y if u ≤ x and u ≤ y. A vertex v is a most recent common ancestor of x and y, denoted mrca(x, y) or sometimes mrca (x, y; N ), if v is a common ancestor of x and y and, in addition, for every common ancestor u of x and y, u ≤ v. A most recent common ancestor mrca(x, y) need not exist, and an example will be given in Fig. 6. If a most recent common ancestor of x and y exists, it is unique.
An arc (a, b) is redundant or a short-cut if there exists a path a = u 0 , u 1 , . . . , u n = b, n ≥ 2, that does not contain the arc (a, b).
There are several types of X -networks which will be of interest: An X -network is successively cluster-distinct (SCD) (Willson 2012) if each arc (a, b) satisfies that either Thus, successive vertices have different clusters except possibly for the arc ( p(x), φ(x)) entering a leaf. The definition is slightly modified from Willson (2012) because of our requirement that each leaf φ(x) must have in-degree one.
An X -network N is tree-child (Cardona et al. 2009) if every vertex that is not a leaf has a tree-child.
An X -network N = (V , A, ρ, φ) (possibly with hybrid leaves) is regular (Baroni et al. 2004) if (1) the cluster map cl : V → P(X ) is one-to-one, where P(X ) is the power set of X ; (2) N has no redundant arcs; and Note that because of (3), any regular network is SCD.
Since this paper studies approximation of one network by another, we utilize a numerical distance between two arbitrary networks with the same leaf-set for comparisons, as in Willson (2022). Let N and N be X -networks. One interesting way to compare them is their Robinson-Foulds distance d R F (N , N ). Here d R F (N , N ) is defined as the number of members of Cl(N ) and Cl(N ) which are present in one but not both. This definition is an extension of the notion for trees given in Robinson and Foulds (1981). For certain classes of X -networks, d R F is a metric. As an example, for fixed X , it is a metric on the collection of regular X -networks (Baroni et al. 2004).

Prior Results
This section states results from Willson (2022) that will be needed, especially in Sect. 9. Let N = (V ,A,ρ,φ) and N = (V , A , ρ , φ ) be X -networks. A connected surjective digraph (CSD) map (Willson 2012) If ψ 1 : N → N and ψ 2 : N → N are CSD maps, then it is proved in Willson (2012) that the composition ψ = ψ 2 • ψ 1 : N → N is also a CSD map. If both maps are leaf-preserving, then so is the composition.
Let ψ : N → N be a leaf-preserving CSD map, and let 2 V denote the set of subsets of V . A wired lift of f (or of N into N ) is a pair (ψ −1 , E 1 ) where ψ −1 is the map ψ −1 : V → 2 V given by ψ −1 (v ) and where E 1 ⊆ A satisfies the following two conditions: We will say the arc (u, v) represents (u , v ) or is a pre-arc of (u , v ).
Call the members of E 1 the representative arcs since each represents an arc of A . Note that the collection of all Theorem 3.1 Willson (2022) Suppose N and N are X -networks and ψ The wired lift (ψ −1 , E 1 ) can be visualized using a diagram of N . An example is shown in Fig. 1. The diagram is exactly the diagram of N except that each arc may be solid or dashed. Suppose N = (V , A, ρ, φ). For every arc (u, v)  The upper graph shows the wired lift as a diagram of N . Dashed arcs indicate identification of the vertices and can be followed in either direction. Solid arcs must be followed in their direction. If all arcs were solid, then the upper diagram would be exactly N . The lower diagram shows N . Note that, as indicated by the dashed arcs, 7 and 9 are identified into [7,9]. Similarly 8 and 12 are identified into [8,12] while 15 and 16 are identified into [15,16]. The map ψ satisfies, for example, ψ(8) = ψ(12) = [8, 12], ψ(7) = ψ(9) = [7, 9], and ψ(11) = 11 that ψ(u) = ψ(v) draw the arc as a dashed arrow. Dashed arcs make the sets ψ −1 (v ) apparent in N and each vertex of N corresponds to a connected component of the dashed arcs. Each arc (u , v ) ∈ A has at least one corresponding solid arc (u, v) ∈ A, justifying the word "lift." The "wires" are the dashed arcs. In Willson (2022), the wired lifts for normal networks could contain three types of arcs-wide solid, thin solid, and dashed. For DCTC-networks, however, only two types are needed and for ease of visualization we choose solid and dashed. The solid arcs in this paper correspond to the wide solid arcs in Willson (2022), while the dashed arcs in this paper correspond to the thin solid arcs in the previous paper.
Let N = (V , A, ρ, φ) and N = (V , A , ρ , φ ) be X -networks, with ψ : V → V a CSD map, and suppose (ψ −1 , E 1 ) is a wired lift of f . If u and v are in V , we say there is an allowed step from u to v if either (u, v) ∈ E 1 , or ((u, v) ∈ A and f (u) = f (v)), or ((v, u) ∈ A and f (u) = f (v)). Note that the step either follows a solid arc in E 1 forwards or else follows a dashed arc, possibly forwards, possibly backwards.
A generalized path or g-path in N from a to b is a sequence a = u 0 , u 1 , . . . , u k = b of vertices such that for i = 0, . . . , k − 1, there is an allowed step from u i to u i+1 .
In Fig. 1, let N denote the initial X -network and N be such that ψ : N → N is a CSD map, where the upper part of Fig. 1 is the wired lift. If all arcs in the upper part of Fig. 1 were solid, then the figure would show N . Because of the dashed arcs, we see that ψ(8) = ψ(12), ψ(7) = ψ(9), and ψ(15) = ψ(16). There is a g-path 12,8,11,1; hence, in N there is a path from ψ(12) to ψ(1) = 1. In N , there is clearly no path from 12 to 1. Note also that all children of 12 in N are hybrid. But ψ(12) = [8,12] in N has the tree-child 11. In fact, N is tree-child.
Theorem 3.2 Willson (2022) Let N = (V , A, ρ, φ) and N = (V , A , ρ , φ ) be Xnetworks, with ψ : N → N a CSD map. There is a path from ψ(a) to ψ (b) in N if and only if there is a g-path in N from a to b.
Let N = (V , A, ρ, φ) be an X -network. A subset K ⊆ A of arcs is strongly closed if it satisfies the following: Suppose there are vertices a, u 0 , u 1 , . . . , be an X -network and D ⊆ A be a subset of arcs. There exists a unique K ⊆ V such that (i) D ⊆ K , (ii) K is strongly closed, and (iii) for every strongly closed C ⊆ A such that D ⊆ C, it follows that K ⊆ C.
Thus, K is the unique minimal strongly closed subset of A containing D.
The subset K of the theorem is denoted K (D) and called the strong closure of D.
Theorem 3.4 Willson (2022) Suppose N = (V , A, ρ, φ) is an X -network and D ⊆ A contains no X -arc. There is a uniquely determined X -network M D (N ) such that (1) There is a projection map ψ : N → M D (N ) which is a leaf-preserving CSD map.
The idea of M D (N ) is relatively simple. The set D consists of a list of arcs in N . For each arc (a, b) ∈ D, we contract the arc in N to a point. If these contractions result in any directed cycles, we contract the arcs in any such cycle. The result is M D (N ), which is shown in Willson (2022) to be a well-defined acyclic network. Since M D (N ) is obtained by contracting certain arcs of N , it is a kind of quotient graph of N .
We now return to the proof of Theorem 3.3 (Theorem 3.7 of Willson (2022)). That proof constructs a sequence D 0 , D 1 , · · · of subsets of A that starts with D 0 = D and ends with K (D). An easy inductive argument shows that each D i satisfies D i ⊆ E.
Note that (4) shows that if f : N → N is a CSD map which contracts the arcs of D and N is an acyclic X -network, then all the identifications in M D (N ) occur also in N . Thus, all the identifications in M D (N ) are needed to obtain an acyclic X -network that contracts the members of D.
As an application, there is the following result: Theorem 3.5 Willson (2022) Let N be an acyclic X -network. There is an X -network SCD(N ) such that (2) There is a leaf-preserving CSD map ψ : N → SCD(N ).
Essentially, the computation of SCD(N ) contracts to a single vertex each arc (a, b) such that cl(a) = cl(b). Special attention is given to the case where b is a leaf to ensure that no leaf becomes hybrid.

Basic Properties of X-Networks that are Both Distinct-Cluster and Tree-Child
, then either u = v or else one of u and v is a leaf φ(x) for some x ∈ X and the other is p(x) where p(x) is hybrid with out-degree 1. Thus the only way to have The definition is intended to modify slightly the idea that no two vertices have the same cluster so as to be consistent with our assumption that each leaf has in-degree one.
The following theorem restates some results in Cardona et al. (2009).
is an X -network that is tree-child.
(1) Given any vertex v ∈ V that is not a leaf, there is a tree-child path from v to some leaf φ(x).
Then v = p(x) by SCD and the claim is true. The same is true if v is a leaf. So we may assume that neither u nor v is a leaf. There is a tree-child path We will call an X -network N a DCTC X -network if it is acyclic, distinct-cluster (DC), and tree-child (TC).

Corollary 4.3 An X -network that is SCD and tree-child is a DCTC X -network.
In Fig. 2, M is DC but not TC since 9 has no tree-child. N is TC but not DC since cl(6) = cl(7) = {2, 3}.
Any tree-child network with n leaves has at most n − 1 hybrid vertices by results in Cardona et al. (2009). The following result is a slight improvement for DCTC networks. The bound n − 2 is the same as for normal networks; see Steel (2016).

Theorem 4.4 Suppose N = (V , A, ρ, φ) is a DCTC X -network with n leaves. Then the number of hybrid vertices is at most n − 2.
Proof Since N is TC, the root ρ must have a tree child c 1 . Since N is DC, cl(ρ) = cl(c 1 ). Hence, ρ must have a child c 2 such that cl(c 2 ) is not contained in cl(c 1 ). There is a path from ρ to c 2 . Choose a path from ρ to c 2 of maximal length. The child c of ρ on that path cannot be hybrid since the other parent cannot be ρ and hence the path could be made longer. Hence, c is a tree-child. We cannot have c = c 1 because then c ≤ c 2 so cl(c 2 ) ⊆ cl(c) = cl(c 1 ), a contradiction. Hence, c and c 1 are two distinct tree-children of ρ.
Choose a tree-path from c 1 to the leaf φ(x) and a tree-path from c to the leaf φ(y). I claim that x = y. Otherwise if φ(x) = φ(y), by Theorem 3.1(2) either c 1 is on the path from c to φ(y) or c 1 ≤ c. The latter cannot happen since the path would have to go through ρ, the unique parent of c. Hence, c 1 lies on the path from c to φ(y). By a symmetric argument, c lies on the path from c 1 to φ(x) = φ(y). Thus, c lies on a cycle, contradicting that N is acyclic.
Suppose the hybrid vertices are h 1 , h 2 , . . . , h k . From h i , there is a tree-child path to some leaf φ(x i ). by a similar argument the members x, y, x 1 , . . . , x k are all distinct. Fig. 2 note that 9 is not visible since a path to 5 from the root 6 can pass through 10 and not 9; and a path to 3 from 6 can pass through 7 and not 9; moreover, 5 and 3 are the only leaf descendants of 9. On the other hand, 7 is visible since every path from 6 to 1 passes through 7.
As is pointed out in Francis et al. (2021) if a vertex v is not visible, then the evolutionary history of gene flow in the corresponding phylogenetic network may have bypassed v and the presence of v could have no genetic impact on any of the leaves. Hence, visibility of all vertices is a desirable property in a phylogenetic Xnetwork.

Theorem 4.5 Suppose N = (V , A, ρ, φ) is a tree-child X -network. Then every vertex v is visible.
Proof This result is shown in Cardona et al. (2009), where instead of saying "u is visible since it is on every path from the root to φ(x)" the authors say "x is a strict descendant of u." Proof Every DCTC X -network and every normal X -network is tree-child.
for some x ∈ X , and u is hybrid with out-degree 1; or (3) u has out-degree at least 2, and for each child c of u, cl(c) cl(u).
Proof If outdeg(u) = 0, then (1) occurs. If outdeg(u) = 1, let c be the unique child of u. Then cl(u) = cl(c). Since N is DC by Theorem 4.2, (2) occurs. Otherwise outdeg(u) ≥ 2. If c is a child of u, since N is DC it follows cl(c) cl(u).

Fig. 3 M is DCTC but not normal while N is normal but not DCTC
The following facts about tree-child networks, proved in Cardona et al. (2009), necessarily apply to DCTC networks: • Let m be the maximal in-degree of a hybrid vertex and n be the number of leaves.
Then |V | ≤ (m + 2)(n − 1) + 1. • For each X there is a metric (called the μ-distance) on the class of tree-child phylogenetic X -networks.
(2) By Part (1) there is a path from u to w and a path from w to v. Hence, (u, v) is redundant.

Corollary 4.9 If N is a DCTC X -network and cl(v) cl(u), then u ≤ v.
The property of being DCTC is closely related to being normal, but not the same. Figure 3 shows a network M that is DCTC but not normal, because (4, 6) is redundant. On the right N is normal but not DCTC because cl(8) = cl(9) = {2, 3}. Nevertheless, we show below that given any DCTC X -network there is a closely related normal X -network.
The operator S If N = (V , A, ρ, φ) is an X -network, let S(N ) denote the result of contracting every arc (u, w) such that outdeg(u) = 1. This applies even if w is a leaf. The operator S was briefly introduced in Willson (2022). Some basic properties are given in the next theorem.
is an X -network except that a leaf may be hybrid. (4) ψ is a CSD map but need not be leaf-preserving.
It is clearly an acyclic X -network. For more details, see Willson (2022).
An important use of S will be to compute S(R(N )) where N is an acyclic Xnetwork. Note that R(N ) then has no redundant arcs, and often the result can be simplified. The simplification might be performed by the operator S. If N is DCTC, we show below that S(R(N )) is normal. Figure 4 gives an example of the computation of S(R(N )). Start with the DCTC X -network N = (V , A, ρ, φ) on the left. The two redundant arcs are perfectly good arcs in N . Remove the two redundant arcs from A to form A , so R(N ) = (V , A , ρ, φ). The result has vertices 9 and 11 with out-degree one. Compute S(R(N )) = (V , A , ρ , φ ) by identifying each such with its unique child, forming new vertices [1,9] and [3,11] in V . Note that in S(R(N )) the leaf [3,11] The following theorem shows that if N is a DCTC X -network, then S(R(N )) is a normal network. This relationship shows a close relationship between DCTC networks and normal networks. It will be the basis for an expanded analysis in Sect. 6.
) is a normal X -network, possibly having some leaves that are hybrid, and containing no vertex with out-degree one. No two vertices have the same cluster.
Proof (1) It is immediate that R(N ) contains no redundant arcs, so we must only show R(N ) is TC. Since N is tree-child, each vertex u that is not a leaf has a tree-child w.
The arc (u, w) cannot be redundant. Otherwise, if (u, w) is redundant in N , then by redundancy there is a lengthening path starting at u and ending at w but not including the arc (u, w). Hence, indeg(w) ≥ 2, a contradiction. It follows that (u, w) remains an arc in R(N ), so w is a tree-child of u in R(N ).
(2) Note that x ∈ cl(u; N ) iff there is a path in N from u to φ(x). A path from u to φ(x) of maximal length consists of only non-redundant arcs and remains a path in R(N ). Hence, cl(u; N ) ⊆ cl(u; R(N )). But trivially every path from u to φ(x) in R(N ) is also such a path in N , proving Part (2).

Counting Vertices in DCTC X-Networks
In this section, we prove that the number of vertices in a DCTC X -network with n leaves is quadratic in n. We also study when a vertex is the mrca( is the number of leaves whose parent is hybrid with out-degree one. Figure 5 shows a DCTC X -network N with β = 1, from the single post-hybrid leaf 3 since 10 = p(3) has out-degree one. Note that 2 is not post-hybrid even though 8 = p(2) is hybrid, since outdeg(8) = 2.
For such x, there are two vertices p(x) and φ(x) with the same cluster {x}. Hence, (1) holds.
(2) Remove all redundant arcs of N to obtain R(N ). By definition, R(N ) contains no redundant arcs. Clearly R(N ) is tree-child since no arc (u, w) ∈ A which is an arc to a tree-child w of u is redundant; otherwise there would be a lengthening path from u to w which does not contain (u, w), hence another parent of w. It follows that R(N ) is normal.
Moreover, the removal of redundant arcs does not change cl(u) for any vertex u. Hence, R(N ) remains CD. For every arc (u, w) such that cl(w) is not {x} for any x ∈ X , we must have outdeg(u) > 1; otherwise cl(u) = cl(w), a contradiction. Suppress all the vertices of out-degree one identifying the ends of any arc (u, w) with cl(u) = cl(w) = {x} for some x ∈ X . The result is S(R(N )) which will have exactly c(N ) vertices, one for each cluster. But S(R(N )) is normal and has no vertices of outdegree one. By a result in Willson (2010), it has at most n(n + 1)/2 vertices, proving Part (2). (The result stated in Willson (2010) differs slightly since, in that paper, X includes the root as well as the leaves.) (3) follows immediately.
(4) Clearly β(N ) ≤ n since there are n leaves. Suppose β(N ) = n. Then for every x ∈ X , p(x) is hybrid with out-degree 1. Yet N is tree-child and there must be a tree-path from the root ρ to some leaf. Since every p(x) is hybrid, this is not possible. Hence, β(N ) < n.
If N is tree-child, then it is immediate that for any vertex u there is a tree-child path from u to a leaf.
The notion of the most recent common ancestor mrca(x, y) of two leaves x, y ∈ X is defined in Sect. 2. It need not exist in general, but when it does it can provide useful information. Traits shared by species x and y may sometimes be traced back to mrca(x, y) or earlier. It is therefore useful to know when mrca(x, y) exists.
The following theorem shows in a DCTC X -network that every vertex with outdegree at least two has the form mrca(x, y) for distinct x, y ∈ X . This result is interesting for its own sake as well as improving the upper bound given in Theorem 5.1(5) for the number of vertices.
Theorem 5.2 Suppose N = (V , A, ρ, φ) is a DCTC X -network. Suppose u ∈ V has out-degree at least two. Let its distinct children be c 1 , c 2 , . . . , c m .
Assume c 1 is a tree-child of u. Assume cl(c 2 ) cl(c 1 ). Let u = u 0 , u 1 = c 1 , . . . , u k = φ(x) be a tree-child path from u to φ(x) through c 1 , and let c 2 = v 0 , v 1 , . . . , v j = φ(y) be a tree-child path from c 2 to φ(y). Then contradicting DC unless u is hybrid with out-degree one. But the latter contradicts that u has out-degree at least 2. Hence Part (1) is true.
Part (2) follows since in the proof of Part (1) the choice of c 1 was arbitrary. The second half of (3) is immediate since c 2 is a child of u. Next I claim that there are no i and s such that u i = v s . Otherwise suppose u i = v s and i is minimal satisfying this condition. If i = 0, then there is a cycle u to c 2 to v s = u, a contradiction. If i = 1 and s ≥ 1, then c 1 has the parents u and v s−1 which are distinct by the choice of i, contradicting that c 1 is a tree-child of u. If i = 1 and s = 0, then c 1 = c 2 , contrary to hypothesis. Thus, i > 1. If s ≥ 1, then u i has the parents u i−1 and v s−1 which are distinct by choice of i, contradicting that u i is a tree-child. Thus u i = v 0 = c 2 , so u i has the parents u i−1 and u, which is not possible since i > 1. This proves the claim. This also proves that x = y, so (3) is true.
For (4) suppose w ∈ V satisfies {x, y} ⊆ cl(w). By Theorem 4.1 either w = u i for some i, 1 ≤ i ≤ k, or else w ≤ u. If w ≤ u, then (4) is true. Suppose instead w = u i for some i, 1 ≤ i ≤ k (so c 1 ≤ w). By Theorem 4.1 either w = v s for some s satisfying 1 ≤ s ≤ j, or else w ≤ c 2 .
Consider the case where w = v s . Then w has the parents u i−1 and v s−1 , contradicting that w = u i is a tree-child. The remaining possibility is w ≤ c 2 . Hence, c 1 ≤ w ≤ c 2 so cl(c 2 ) ⊆ cl(c 1 ). This contradicts the choice of c 2 , proving Part (4). Then Part (5) follows from Part (4).
Note in the proof that x cannot be post-hybrid, but it is possible that c 2 = p(y) and y is post-hybrid. Fig. 6 A DCTC X -network N in which mrca(2, 3) does not exist. Note also that 6 is not visible to 2 or to 3, and 7 is not visible to 2 or to 3 Remark While Theorem 5.2 says that in a DCTC network many vertices have the form mrca(x, y) for leaves x, y, it does not say that for all x, y ∈ X that mrca(x, y) exists. In Fig. 6, vertices 5, 6, and 7 are all the common ancestors of 2 and 3. We show that mrca(2, 3) does not exist. By the definition in Sect. 2, if u is a common ancestor of 2 and 3 and u = mrca(2, 3), then for every other common ancestor v of 2 and 3 we must have v ≤ u. But u = 5 since it is false that 6 ≤ 5; moreover, u = 6 since it is false that 7 ≤ 6; and u = 7 since it is false that 6 ≤ 7. Hence mrca(2, 3) does not exist. Briefly, 6 and 7 are both common ancestors of 2 and 3 as recent as possible in N , but neither is an ancestor of the other.
A related observation in Fig. 6 is that vertex 6 is not visible to either 2 or 3. It is not visible to 2 since there is the path 5, 7, 8, 2 from the root 5 to 2 that misses 6; it is not visible to 3 because there is the path 5, 7, 9, 3 that misses 6. But 6 is visible to 1 since every path from 5 to 1 includes vertex 6.
For Fig. 6, the proof of Theorem 5.2 merely says that 6 = mrca(1, 2) and 7 = mrca(3, 4). Note that the network of Fig. 6 is both normal and DCTC. (1) If u ∈ V satisfies outdeg(u) ≥ 2, then there exist distinct x, y ∈ X such that u = mrca (x, y). Moreover, at least one of x and y is not post-hybrid. (2) If u ∈ V satisfies outdeg(u) = 1, then there exists x ∈ X such that u = p(x) and u is hybrid. (3) The only vertices which do not have the form mrca(x, y) for distinct x, y ∈ X are the leaves φ(x) and the vertices p(x) that are hybrid with out-degree one, both having cluster {x}.

Proof
(1) follows from the theorem. If u has out-degree one, let c be its unique child. Then cl(u) = cl(c). Since N is DC, there exists x ∈ X such that u = p(x), c = φ(x), and u is hybrid, proving Part (2). Then Part (3) is immediate.
If N is DCTC, we also have the following result involving mrca(B) where B ⊆ X . Every vertex u satisfies u = mrca(cl(u)) with the exception of the vertices u = p(x) which are hybrid with out-degree one. N = (V , A, ρ, φ) be a DCTC X -network.
Proof For (1), it is immediate that for x ∈ cl(u), u ≤ φ(x). Thus u is a common ancestor of cl (u). Conversely, if w ≤ φ(x) for all x ∈ cl(u), then in particular, there are y, z ∈ X such that u = mrca(y, z) by Corollary 5.3, and we have w ≤ y and w ≤ z. Hence, w ≤ u. Thus, u = mrca(cl(u)).
If x ∈ P, then every path to φ(x) from another vertex includes the hybrid vertex p(x). It follows that there is no tree-child path from u ∈ V 2 to φ(x) for x ∈ P, only from p(x) to φ(x).
For each vertex u ∈ V 2 , we first choose a tree-child c 1 leading to a tree-child path from u to φ(x). From another child c 2 of u (such that it is false that cl(c 2 ) ⊆ cl(c 1 )), we obtain a tree-child path to φ(y), obtaining an allowed 2-set {x, y}. Note that x can't be in P. If c 2 has out-degree one, then c 2 = p(y) and y ∈ P; but otherwise y is not in P.
The number of 2-sets {x, y} with no member of P is n−β 2 . The number of 2-sets with exactly one member of P is β(n − β). Hence, the number of allowed 2-sets is . The number of vertices of out-degree 1 is |P| = β, and the number of leaves is n. Hence, the number of vertices is |V | = |V 2 | + |P| + n ≤ n−β 2 + β(n − β) + β + n = (n 2 − β 2 + 3β + n)/2.
For the DCTC X -network in Fig. 5, n = 4 and there are 11 vertices, so the upper bound of Theorem 5.6 is tight in this example.
We show in Theorem 7.1 that the upper bound in Theorem 5.6 is tight for n ≥ 3.

DCTC Networks with the Same Clusters
Several different DCTC X -networks N 1 , N 2 , · · · , N k can all satisfy that d R F (N i , N j ) = 0, so they have the same clusters. In this section, we study their relationship. Our main theorem for this section states that they all have the same S(R(N i )). Their differences will involve redundant arcs. For example, consider Fig. 7, where X = {1, 2, 3, 4}. In L, p(3) = 6 while in M, p(3) = 8; the extra vertex 8 in M allows the redundant arc (5, 8). The redundant arc (5,7) in N did not require a new vertex. In general, adding a redundant arc between two existing vertices in a DCTC X -network N to create M does not change the set of clusters and hence yields d R F (M, N ) = 0. One must take care that the result remains tree-child.
Suppose X is a nonempty set. Let C be a collection of nonempty subsets of X such that for each x ∈ X , {x} ∈ C and X ∈ C. Baroni et al. in Baroni et al. (2004) construct an X -network which we shall denote Reg(C). The vertex set will be the set C, and Reg(C) is the cover digraph of C. More explicitly Reg(C) = (C, A, X , φ) where there is an arc (C 1 , C 2 ) ∈ A iff (a) C 2 C 1 , and (b) there is no C 3 ∈ C distinct from C 1 and C 2 such that C 1 C 3 C 2 . The root is X , and the map φ : X → C is φ(x) = {x}.
Following are some properties of Reg(C) from Baroni et al. (2004): (1) Reg(C) is a regular X -network (possibly having hybrid leaves).
(2) An X -network N (possibly having hybrid leaves) with cluster set Cl(N ) is regular iff N is isomorphic with Reg(Cl(N )).
Theorem 6.1(2) implies that, given a DCTC X -network N , all DCTC X -networks M such that d R F (N , M) = 0 can be found as follows: If there is any leaf u of M that is hybrid, then there exists a unique x ∈ X such that u = φ(x). For all such {x}, modify M 0 by adding a new arc (u, φ(x)), making u = p(x) and producing a new network M 1 . Now M 1 is a DCTC X -network. Next recursively adjoin redundant arcs to M 1 , taking care that the result will still be tree-child and DC. The redundant arcs can be of form (a, b) where a and b are vertices of M 1 . Alternatively, they can be of form (a, b) where a is a vertex of M 1 and b is a new vertex in the middle of what was the arc ( p(x), φ(x)), where outdeg( p(x)) > 1. Any network M k so obtained will satisfy d R F (M k , N ) = 0.
For example, consider Fig. 8 again. Given N , we compute Cl(N ) and then Reg(Cl(N )). Suppose we want to reconstruct N . Let M 0 = Reg(Cl(N )). Then M 1 replaces 2 by 2 and adds the arc (2 , 2). Next we subdivide (34, 3) at 3 and adjoin the redundant arc (234, 3 ). At this stage, we have reconstructed N . We could continue to get another DCTC X -network with the same clusters as N by adjoining new redundant arcs (1234, 2 ) and/or (1234, 3 ). We could not, however, adjoin a new redundant arc (1234, 34) since then all children of 234 would be hybrid and the result would not be tree-child.

The Upper Bound on the Number of Vertices is Tight
Theorem 5.6 asserted that if N is a DCTC X -network with n leaves, then the number of vertices is at most (n 2 + n + 2)/2. In this section we show that this upper bound is tight. We shall construct a sequence of DCTC networks L n , for n ≥ 3, where L n has n leaves and v n = (n 2 + n + 2)/2 vertices. It turns out that L n contains no redundant arcs, hence is also normal. The construction mimics a construction in Bickner (2012) of interesting normal networks. The construction will be inductive. Fig. 9 The DCTC networks L 3 and L 4 . L n has v n = (n 2 + n + 2)/2 vertices and n leaves Fig. 10 The DCTC networks L 5 and L 6 . L n has v n = (n 2 + n + 2)/2 vertices and n leaves We start with L 3 shown in Fig. 9 left with 3 leaves. It is easily seen to be DCTC and normal and has 7 vertices; note that v 3 = 7. Also note that 2 is post-hybrid and p(2) = p2 is the only hybrid vertex. The root is r 3 and the child r 3m1 is the child of r 3 that is an ancestor of 3.
To obtain L 4 , shown in Fig. 9 right, we add 4 new vertices to L 3 ; these are a new root r 4 with a tree-child path of new vertices r 4, r 4m1, r 4m2, 4. Note r 4 also has tree-child r 3; r 4m1 also has child r 3m1, and r 4m2 also has child p2. Note that r 3m1 has become hybrid with tree-child 3, while r 3 still has the tree child p1. Since p2 was already hybrid in L 3 , r 3m1 is the only new hybrid. Every non-leaf vertex of L 3 still has a tree-child in Hence L 4 is DC. Finally, L 4 has 7+4 = 11 vertices, and v 4 = 11. Since it has no redundant arcs, it is also normal.
Given L n , we show how to define L n+1 . The process is illustrated in Fig. 10 by showing L 5 and L 6 . We modify L n by letting n = n + 1 and adding n new vertices along a tree-child path rn , rn m1, rn m2, . . . , rn m(n − 2), n , hence with new arcs (rn , rn m1), (rn m1, rn m2), . . ., (rn m(n − 2), n ). We also add arcs (rn , rn), (rn m1, rnm1), (rn m2, r (n − 1)m1), (rn m3, r (n − 2)m1), . . . , (rn m(n − 2), p2). The only new hybrid vertex is rnm1 but rn still has the tree-child rn. All non-leaf vertices of L n still have a tree-child; and the new vertices of L n+1 have a tree-child from the tree-child path, so L n+1 is TC. Since it has no redundant arcs, it is also normal.
We have proved the following: Theorem 7.1 For n ≥ 3, there exists a network with n leaves which is both DCTC and normal and which has v n = (n 2 + n + 2)/2 vertices. Hence, the upper bound in Theorem 5.6 is tight for all n ≥ 3.
If n = 2 a DCTC X -network with n leaves has at most 3 vertices, where 3 < 4 = v 2 , so the restriction on n is needed.

DCTC Networks from Normal Networks
Recall from Theorem 3.5 that when N is an acyclic X -network there is an X -network SCD(N ) which is successively-cluster-distinct (SCD) with other interesting properties. In this section, we see that, given a normal network N , it follows that SCD(N ) is a DCTC network. If (u, v) ∈ D and outdeg(u) = 1, then (u, v) is contracted in the formation of SCD(N ).
If (u, v) ∈ A and outdeg(u) > 1, I claim cl(u) = cl (v). To see this, let w be another child of u beside v. Since N is normal, we may assume at least one of {v, w} is a tree-child of u. Choose a tree-path from v to the leaf x and from w to the leaf y. Note that u ≤ x and u ≤ y. From Willson (2010), u = mrca(x, y). But since cl(u) = cl(v) we have v ≤ x and v ≤ y, whence v ≤ u by the mrca property. This is a contradiction, proving cl(u) = cl (v). Hence (u, v) / ∈ D. (N ) is SCD, although possibly containing a trivial vertex of form p(x) for some x ∈ X . In this situation, let the unique parent of p(x) be denoted u(x). Now Theorem 4.4 of Willson (2022) shows SCD(N ) is obtained by contracting each such arc (u(x), p(x)) to suppress the trivial vertex p(x). Thus, SCD(N ) is SCD.

By Theorem 4.3 of Willson (2022) M D
But it is clearly tree-child as well since whenever we contracted (u, v) ∈ D into a point [u, v] the tree-child of v becomes a tree-child for [u, v]. And whenever we contracted (u(x), p(x)) into [u(x), p(x)], φ(x) becomes a tree-child of [u(x), p(x)]. Hence SCD(N ) is DCTC by Corollary 4.3. Since N contained no redundant arcs, the same is true about SCD(N ).
The operator S . The operator S is described in Sect. 4. Here we give a slight modification. Given a normal X -network N , let S (N ) be defined by first contracting arcs in D = {(u, v) ∈ A : outdeg(u) = 1 and v is not a leaf} and then contracting any remaining arcs (u(x), p(x)) where, for x ∈ X , p(x) is a trivial vertex with unique parent u(x) having out-degree at least 2.
The proof of Theorem 8.1 shows that if N is normal, then S (N ) is a simpler construction of SCD(N ). N = (V , A, ρ, φ) be a normal X -network. Then S (N ) = SCD(N ) and is a DCTC X -network, but contains no redundant arcs.

Corollary 8.2 Let
We will illustrate the use of Corollary 8.2 in Examples 1, 2 and 3.

Finding a Standard DCTC X-Network from a Given X-Network
Given an X -network N , this section shows how to produce a uniquely determined Xnetwork which is DCTC and which we denote DCTC(N ). The procedure resembles that in Willson (2022) used to produce a uniquely determined network Norm(N ) which is normal. While the procedure in Willson (2022) involves the removal of redundant arcs, the procedure in this section does not involve explicit removal of redundant arcs and consequently has some advantages. Let (1) v is not a leaf; and (2) every child of v is hybrid. Hence, whenever (v, u) is an arc, u has a parent other than v.
An X -network N is tree-child obstacle-free if it contains no tree-child obstacle.

Theorem 9.1 Suppose N is an X -network that is tree-child obstacle-free. Then N is a tree-child X -network.
Proof By hypothesis, for every vertex v that is not a leaf, there is an arc (v, c) with indeg(c) = 1. It follows that c is a tree-child of v. Hence, N is tree-child.
Given an X -network N , suppose we seek a related DCTC X -network. Our strategy will be to compute SCD(N ) to make it SCD. Then we recursively remove tree-child obstacles until there are no more obstacles. If we seek to obtain a uniquely determined tree-child network we are careful not to make arbitrary choices of which arcs to merge.
Just as for pre-normal obstacles in Willson (2022), there are different types of tree-child obstacles.
Let N be an X -network. Suppose c is a tree-child obstacle. An allowable 1-fold parent chain of c is a path p 1 , c where p 1 is a parent of c, ( p 1 , c) is not redundant, and such that p 1 has a tree-child d = c. An obstacle c is of type 1 if c has an allowable 1-fold parent chain. If c has type 1, and p 1 , c is an allowable parent chain, let Dc( p 1 , c) = {( p 1 , c)}. We will be merging the arc in Dc( p 1 , c). Suppose c is a tree-child obstacle. An allowable k-fold parent chain for c is a path p k , p k−1 , . . . , p 1 , p 0 = c such that no arc ( p i , p i−1 ) is redundant, and p k has a tree-child d = p k−1 . An obstacle c is of type k if (a) c is not of type 1, . . . , k − 1; (b) c has an allowable k-fold parent chain.
In this situation, for each such allowable k-fold parent chain write We will be merging the arcs in Dc( p k , p k−1 , . . . , c).
It is false that in every X -network every tree-child obstacle has a type. Figure 11 shows an X -network N in which 6 is a tree-child obstacle that has no type. Nevertheless, in an SCD X -network, the next result shows that every tree-child obstacle has a type.

Theorem 9.2 Let N be an SCD X -network. Then every tree-child obstacle c has a unique type.
Proof It is clear that the type, if it exists, is unique. The root ρ satisfies cl(ρ) = X . Since N is SC D, every child c of ρ satisfies cl(c) = X . For every x ∈ X , there is child d of ρ satisfying x ∈ cl(d). If we choose such a child d with maximal cl(d), then (ρ, d) cannot be redundant and d has no parent other than ρ, so d is a tree-child of ρ. Since N is SCD and no child of ρ can have cluster X , ρ has at least two tree-children.
Consider a path from ρ to c which has maximal length k. Write it as u 0 = ρ, u 1 , . . . , u k = c. By Lemma 2.1 of Willson (2022), every arc on this path is non-redundant. Let p i = u k−i , so this path is ρ = p k , p k−1 , . . . , p 1 , c; it is an allowable k-fold parent chain of c since ρ has a tree-child other than p k−1 . Hence, c has type ≤ k.
The next result shows a way to remove a single obstacle of type 1. (2) N r is DCTC and will be denoted DCTC(N ).

Lemma 9.3 Suppose N is an X -network and c is
(

3) DCTC(N ) depends only on the structure of N and not any arbitrary choices. (4) There is a leaf-preserving CSD map
Proof (1) By construction, for i ≥ 1 N i is SCD. If N i has an obstacle c, then by Theorem 7.2 c has a type, so step (5a) can be carried out. Hence, the procedure is well-defined. Every time, the procedure enters step 5, the network has at least one obstacle, so the set D in step 6 is nonempty. Then at least one arc of N i is merged in the formation of M D (N i ), so M D (N i ) has fewer vertices than N i . Since N is finite, it follows that the procedure terminates.
(2) It is immediate that the output N r is SCD since N r = SCD(M D (N r −1 )). Moreover, N r is tree-child by Theorem 7.1 since it has no tree-child obstacles. (Otherwise, the procedure would have computed N r +1 ). Hence, it is DCTC.
(3) is immediate since no choices are made between different obstacles or allowable parent chains for an obstacle.
Just as in Willson (2022), one can define a procedure VARIANT DCTC in which Step (5d) is omitted and Step (5c) is replaced by (5c'). Select one allowable k-fold parent chain p k , . . . , c for c and let Dc = {( p k , p k−1 ), ( p k−1 , p k−2 ), . . . , (p 1 , c)}. The network that is output may be denoted DCTC V ,C (N ) or DCTC V (N ), where C indicates the choice of each parent chain when there are more than one possible. As in Willson (2022) DCTC V ,C (N ) will depend on the choice of the parent chains (when there are more than one). On the other hand, the result may have higher resolution than DCTC(N ) and may sometimes be useful.
Some detailed examples will be presented in Sect. 11.

Some Parameters of Networks
In the examples in Sect. 11, we compare several different networks for the same collection X of leaves. We use the following numerical parameters in related tables. These parameters are chosen in part because they generalize parameters useful in analyzing phylogenetic trees. Our goal is in part to see which phylogenetic networks can be most useful for analyzing gene flow in complicated situations. Quantities useful in trees can sometimes be generalized in more than one way to networks. In some cases, we will compare the parameter for a network with the corresponding parameter for rooted trees. Let N = (V , A, ρ, φ) be an X -network.
• n = |X | is the number of leaves.
• v = |V | is the number of vertices.
• a = |A| is the number of arcs.
• h is the number of hybrid vertices. For a tree, h = 0. In a DCTC network, h ≤ n −2 by Theorem 4.4. • r is the number of redundant arcs. For a tree or a normal network, r = 0.
• o1 is the number of vertices with out-degree one. In a tree such vertices would also have in-degree one and hence would be suppressed as trivial. Hence for a tree, o1 = 0. For a DCTC network, o1 = β(N ). • o2 is the number of vertices with out-degree 2 or higher. In a tree we expect all vertices other than leaves have out-degree 2 or higher, so o2 = v − n. In general, o2 = v − n − o1. • o2m is the number of vertices with out-degree 2 or greater which equal mrca(x, y) for some 2-set {x, y} from X . In a tree, o2m = o2. By Theorem 5.2, in a DCTC network o2m = o2 and the same is true for normal networks (Willson 2010). • mrca is the number of 2-sets of leaves {x, y} ⊆ X for which mrca(x, y) exists.
There are n 2 such 2-sets, and in a tree mrca = n 2 . It often happens that the same vertex u = mrca(x, y) for several different {x, y}. A biologist might be interested in mrca(x, y) in order to trace back to where features common to x and y might have originated. Networks with mrca = n 2 could be especially useful. • c = |Cl(N )| is the number of distinct clusters cl(u) for u ∈ V . In a tree (with no vertices of out-degree one) c = v. For a DCTC network c = v − β(N ) by Theorem 5.1. • vi is the number of visible vertices. In a tree vi = v. By Corollary 4.6, vi = v in a normal or DCTC network. It is useful for all vertices to be visible. • 0tc is the number of non-leaf vertices with no tree-child. In a tree, 0tc = 0. In a tree-child network, by definition 0tc = 0. A network with 0tc > 0 is not tree-child. It will be useful for a network to have small 0tc. • When several networks are being analyzed related to the network N , for the specified network M, d = d R F (N , M).

Examples
This section contains three detailed examples of the calculation of DCTC(N ) from a network N , two of them using real data.
We shall occasionally compare DCTC(N ) with Norm(N ) and FHS(N ). Recall that Norm(N ) was a uniquely determined normal network constructed from N as described in (Willson 2022).
The Francis et al. (2021) "normalization" of N (denoted here as FHS(N )) has vertex set the set of all visible vertices of N . There is an arc (u, v) in FHS(N ) between distinct vertices u and v provided there is a path in N from u to v and in addition there is no visible vertex w distinct from u and v such that there are paths from u to w and   (9,[13,16,20]) into [9,13,16,20]. Since N 2 is tree-child, DCTC(N ) = N 2 for N in Fig. 12  Often when there is an allowable parent chain p k , p k−1 , . . . , p 0 = c for an obstacle c, then some of the p k−1 , . . . , p 1 are also obstacles. This is not always true, however, as is seen from the parent chain 13, 16, 20, in which 16 is not an obstacle since it has the tree-child 20. This does not make 20 of type 1 since the tree-child of 16 is in the parent chain.  ([13, 16, 20]), so M D (N ) is not SCD. The procedure then has us compute SCD(M D (N )). To find SCD(M D (N )), only (9,[13,16,20]) is merged into [9,13,16,20]. Figure 14 shows N 2 = SCD(M D (N )). Since N 2 is treechild, we find DCTC(N ) = N 2 and its height is 2. If now ψ : N → DCTC(N ), we have, for example, ψ −1 ([10, 14, 15]) = {10, 14, 15}. N 2 has three redundant arcs ([9,13,16,20],19), ([9,13,16,20],21), and ([10,14,15],18).  Figure 15 shows the wired lift (ψ −1 , E 1 ) of DCTC(N ) into N . In Fig. 15, E 1 consists of the solid arcs, and dashed arcs correspond to identifications. Thus, 10 ∼ 15 ∼ 14 and 9 ∼ 13 ∼ 16 ∼ 20 can be recognized from the dashed arcs, indicating that DCTC(N ) includes vertices [10,14,15] and [9,13,16,20]. Paths in DCTC(N ) correspond to g-paths in the wired lift. For example, there is no path in N from 10 to 7. There is, however, a path [10,14,15],7 in DCTC(N ), which corresponds to the g-path 10,15,14,7 in the wired lift since dashed arcs can be followed either forwards or backwards. Table 1 compares a number of different networks related to Fig. 12. The normal network FHS(N ) has o1 = 2 vertices of out-degree one but it turns out that both such vertices have leaves as children. Hence SCD(FHS(N )) = S (FHS(N ) = FHS(N ) is also DCTC by Corollary 8.2. Similarly Norm(N ) has o1 = 0 vertices of out-degree one so it is also DCTC. Table 1 thus contains the three DCTC networks DCTC(N ), Norm(N ), and FHS(N ). There is a CSD map ψ : N → DCTC(N ). There is, however, no CSD map from N to Norm(N ) (only a connected map, see Willson (2022)), and no CSD map from N to FHS(N ).
Of the three DCTC networks, DCTC(N ) as expected contains redundant arcs, and the others do not. Somewhat surprisingly it has 3 redundant arcs when N had none. It also contains the most hybrid vertices among the three DCTC networks, just one more than FHS(N ) but three more than Norm(N ).
From the mrca column, in N , mrca(x, y) exists for all x, y ∈ X and this is also true for DCTC(N ) and Norm (N ). Surprisingly, Table 1 shows that for exactly one {x, y}, mrca(x, y; FHS(N )) does not exist; this turns out to be mrca(2, 3).
We see that FHS(N ) is closer to the data set N in the sense of d R F than is DCTC(N ) because d R F (N , FHS(N )) < d R F (N , DCTC(N )). But DCTC(N ) contains an additional reticulation and three redundant arcs indicating where some specific modifications arose as N was simplified. Moreover, DCTC(N ) has a wired lift. Marcussen et al. in Marcussen et al. (2015) study the angiosperm genus Viola and present a phylogenetic network N with 16 leaves and with 21 proposed polyploid speciations in their Fig. 4. In our Fig. 16, we show the wired lift of DCTC(N ). If all the arcs are instead made solid, we obtain the network of Marcussen et al. (2015). Here we will sometimes abbreviate DCTC(N ) by DCT. Fig. 16 The wired lift for DCTC (N ) where N is the network for Viola in Marcussen et al. (2015). The line segment with both ends 42 is regarded as a single vertex. This wired lift differs substantially from the wired lift of Norm(N ) in Willson (2022) A normal network (which I denote FHS(N )) obtained as a simplification of N has been published in Francis et al. (2021) and another (called Norm(N )) in Willson (2022). Both differ substantially from DCT. Table 2 makes a comparison among several networks related to N . To find DCT, we first compute SCD(N ), for which the data are also shown in Table 2. It contains 0tc = 3 tree-child obstacles-one of type 1 with 1 allowable parent chain, one of type 1 with 2 allowable parent chains, and one of type 2 with one allowable parent chain, leading to D containing 5 arcs. Then SCD(M D (SCD(N ))) is DCTC hence is DCT = DCTC(N ) with height 2. FHS(N ) is normal. By Corollary 8.2, SCD(FHS(N )) is DCTC, but it is more easily computed as S (FHS(N )). From Table 1, FHS(N ) has o1 = 3 vertices with out-degree Table 2 Comparison of networks related to N for Viola from Marcussen et al. (2015). All have the same number n = 16 of leaves. The number of 2-sets of leaves is 120. None of the networks contain trivial vertices. Other parameters are those discussed in Sect. 10. The networks DCTC(N ), Norm(N ), and S (FHS(N ) (x, y; N ). For example, mrca(45, 50; N ) does not exist, while 28 = mrca(45, 50; DCT). The use of mrca(x, y; DCT) when mrca(x, y; N ) does not exist can narrow the range of sources of common features of x and y.

Example 2
In the wired lift, E 1 contains 49 solid arcs and 32 dashed arcs. Hence, some of the 42 arcs of DCT are represented by more than one member of E 1 . For example (14,10) and (17,18) represent the same arc of DCT since 14 and 17 are identified (as indicated by dashed arcs), and similarly 10 and 18 are identified.
The networks Prenorm(N ) and Norm(N ) are described in more detail in Willson (2022), and FHS(N ) in Francis et al. (2021). Norm(N ) is constructed by removing the redundant arcs from Prenorm(N ).
In this example, Prenorm(N ) is not tree-child, since 0tc = 1. Note that Norm(N ) has 3 fewer vertices, 11 fewer arcs, and 8 fewer reticulations than DCT. Thus, Norm(N ) contains substantially less information about the dataset than does DCT. The loss of the hybrids and arcs is largely because many were associated with the 10 redundant arcs that were removed from Prenorm(N ) to make Norm(N ).
It is interesting that FHS(N ) has only 3 hybrid vertices, since its construction does not involve SCD, which made an initial large drop in hybrid vertices for the calculation of both DCT and Norm(N ).
Also from the table, d R F (N , DCT) = 6, d R F (N , Norm(N )) = 5, and d R F (N , S (FHS(N ))) = 4. Hence, DCT is not as close an approximation to N as either of the others in terms of d R F . DCT has the advantage over Norm(N ) and S (FHS(N )) of many more confirmed hybrid vertices and many more arcs. Moreover, there is a CSD map ψ : N → DCT. In contrast, Norm(N ) has only a connected map ψ : N → Norm(N ), which is not as strong a condition. DCT also has the advantage over Norm(N ) of a significantly larger set E 1 (rather than the many dashed arcs in Norm(N ) to avoid redundant arcs).
In Sect. 12, we will make some further comments about Example 2.

Example 3
Here is another example with real data. Kamneva et al. (2017) study allopolyploid origins in strawberries (Fragaria). In their Additional File 4, Figure  S7-9a is the cluster network for their dataset 9, constructed using all fragments passing the SH test against 100 random trees by support at least 15%. Let N be the network of their Figure S7-9a. The wired lift of DCTC(N ) is shown in our Fig. 17. Table 3 compares some networks related to N .
There are 13 leaves with Drymocallis being the outgroup that roots the network. SCD(N ) is found by merging the single arc (25,28) (SCD(N )), we must merge the arc (22, [23, 24, 25, 28]) to obtain  Figure S7-9a, concerning strawberries (Fragaria). If all the arcs are solid, we obtain their figure S7-9a. The dashed arcs indicate identifications of vertices, so [22,23,24,25,28] is one vertex of DCTC(N ). The solid arcs represent arcs of DCTC(N ). For example, [22,23,24,25,28] has tree-children 26 and 29 Table 3 Comparison of networks related to N for Figure Fig. 17. Table 3 compares information about some relevant networks. It is clear that S (FHS(N )) = FHS(N ) and S (Norm(N )) = Norm(N ), so by Corollary 8.2 both FHS(N ) and Norm(N ) are SCD hence are DCTC X -networks. There is, however, no CSD map from N to FHS(N ) or to Norm(N ).

Discussion
Given a general network N , we have seen that DCTC(N ) has interesting mathematical properties. The biological significance of such a network DCTC(N ), however, is less clear.
The construction of DCTC(N ) usually involves two operations: The first is the calculation of SCD(N ), and the second is the calculation of M D (SCD(N )) where D consists of the arcs of some allowable parent chains. We here consider these two operations separately.
To study the first operation, consider Fig. 18 showing a network N on the left with SCD(N ) on the right. In N , there is a "ladder" structure involving 4, 5, 6, 7, and 8. Note that cl(5) = cl(6) = cl(7) = cl(8) = {2, 3}. Hence, in SCD(N ), the arcs (5, 6), (6, 7) and (7, 8) are contracted, so that 5, 6, 7, and 8 are all identified into the single vertex [5,6,7,8]. This simplification recognizes the difficulty of distinguishing these vertices using only the data on the leaves, since those data usually consist largely of the genomes at the leaves. All that distinguishes them is their placement compared to the root 4. The contribution of 5 cannot readily be distinguished from the contribution of 6 since they both impact exactly the same leaves. While N may be correct in the sense that there might have been several stages of contribution to the genome of 9, nothing really identifies the relevant species. The result would be indistinguishable from the result if the species 5, 6, 7, and 8 were permuted. It can be argued that the data Fig. 19 Adding more leaves to the network N of Fig. 18 retains the ladder and indicates the order of the vertices using information on the leaves. This network M is DCTC Fig. 20 A network N on the left and SCD(N ) on the right. There is a "ladder" of vertices with hybrid children do not really support a network with distinct vertices 5, 6, 7, and 8; the simplification of N into SCD(N ) is an appropriate indication of what is really justified.
The fact that these identifications occur in DCTC(N ) also suggests a remedy: the addition of more leaves can separate 5, 6, and 7, as shown in the network M of Fig. 19. In M, cl(5) = {2, 3, 10, 11, 12} and cl(6) = {2, 3, 11, 12} so cl(5) = cl(6). In fact, M is DCTC, so DCTC(M) = M. Similar results could occur even if the arcs to the new leaves were instead replaced by tree-child paths to new leaves. Of course, finding the new leaves could be difficult; indeed, it is possible that there are no extant descendants of 5, 6, or 7 by tree-child paths if there were many extinction events along such lines of descent.
Less extreme "ladders" can occur. Figure 21 shows a wired lift of SCD(N ) where N is the Viola dataset for Example 2 of Sect. 11. There is a ladder involving 36, 37, 56, 57, 58 all with the same cluster {60, 61}. There is a smaller ladder involving 38, 40, and 41; and another involving 15, 17, 20, 24. The merging of arcs indicates, for example, that more leaves would be needed to clarify the difference between vertices such as 56 and 57. Fig. 22 An SCD network N on the left and DCTC(N ) on the right. Note that 7 is an obstacle of type 1 and is not visible Figure 21 also shows several hybrid vertices a in N with a unique child b which is not a leaf, so cl(a) = cl(b) and a and b are not distinguishable from the data; these include a =10,13,20,24,32,37,39,43,and 48. In N ,such arcs (a,b) may merely indicate the distinction between a hybrid vertex a and the next descendant b where another speciation event occurs. If b is a tree-child, as in the case a = 13, b = 14, then the merging is merely an artifact of our definition of SCD (so that only leaves φ(x) can be the unique child of a hybrid vertex p(x)) and these can be mentally retained. When b is hybrid, as for example in the case a = 48, b = 47, then a and b are parts of another ambiguous ladder, justifying the identification of 47 and 48.
We see that in general the merging of arcs from N in the formation of SCD(N ) identifies some ambiguities in interpreting the original network. When the networks are SCD, ambiguities of that sort are not present.
The interpretation of the second operation (finding M D (SCD(N )) where D consists of the arcs of some allowable parent chains to make the network tree-child) is different. Figure 22 shows an SCD network N on the left and also DCTC(N ) on the right. In this case 7 is an obstacle of type 1 with allowable parent chain 5, 7. In DCTC(N ) the arc (5, 7) has therefore been contracted to a point [5,7]. Note that in N , cl(5) = {1, 2, 3, 4} while cl(7) = {2.3}. As a result, the networks N and DCTC(N ) differ in their resultant statistical effects on the genomes. For example, if N is assumed to describe the genetic history, a mutation from 5 which is absent in 1 and 4 but present in 2 and 3 should be more common than if instead DCTC(N ) is assumed.
In general, if p k , p k−1 , . . . , p 1 , p 0 = c is an allowable parent chain for an obstacle c of type k, it is immediate that for each i ≤ k, cl( p i ; N ) ⊆ cl ([ p k , . . . , c]), where the latter is interpreted in DCTC(N ). Any inheritance suggested by N is also possible in DCTC(N ), while DCTC(N ) has additional possibilities and the statistics of the mutations may have changed. The question is whether the additional properties of DCTC(N ) are sufficiently useful to justify the change.
Recall that a vertex v in N is visible to a leaf φ(x) for x ∈ X if every path from the root ρ to φ(x) contains v. Thus, the genome at v is very likely to affect the genome of each such φ(x). Note that in N of Fig. 22, 7 is not visible. This fact makes the influence of 7 in N on the genomes of the leaves hard to interpret. By way of contrast, every vertex of DCTC(N ) is visible. In DCTC(N ), 6 is visible to 1, 9 to 2, 10 to 3, and 8 to 4, while [5,7] is visible to all leaves and each leaf is visible to itself. Each vertex has at least one leaf to which it is visible.  Figure 23 shows another SCD network N and below it DCTC(N ). Note that 9 and 12 are not visible. Moreover, they are obstacles of type 1 with allowable parent chains 7, 9 and 10, 12, respectively. The effects of 9 and 12 on the genomes at the leaves are difficult to understand because there are several possibilities for the inheritance of the genomes. In contrast, every vertex of DCTC(N ) is visible. The root [7,9] is visible to every leaf, and [10,12] is visible to 4, 5, and 6; 13 is visible to 6; 15 to 4 and 5; 8 to 1; 11 to 2; 14 to 3.
Genetic influence is easiest to interpret making use of tree-child paths. Note that there is no tree-child path in N from 7 or 9 to 6, but the path [7,9], [10,12], 13, 6 is a tree-child path in DCTC(N ).
In general, DCTC(N ) is a network in which minimal simplifications have been made to N so that every vertex becomes visible. As a result, every vertex has at least one leaf on which its genetic influence is important. Changes from SCD(N ) to DCTC(N ) indicate failures of vertices in SCD(N ) to be visible.