New FPT algorithms for finding the temporal hybridization number for sets of phylogenetic trees

We study the problem of finding a temporal hybridization network for a set of phylogenetic trees that minimizes the number of reticulations. First, we introduce an FPT algorithm for this problem on an arbitrary set of $m$ binary trees with $n$ leaves each with a running time of $O(5^k\cdot n\cdot m)$, where $k$ is the minimum temporal hybridization number. We also present the concept of temporal distance, which is a measure for how close a tree-child network is to being temporal. Then we introduce an algorithm for computing a tree-child network with temporal distance at most $d$ and at most $k$ reticulations in $O((8k)^d5^ k\cdot n\cdot m)$ time. Lastly, we introduce a $O(6^kk!\cdot k\cdot n^2)$ time algorithm for computing a minimum temporal hybridization network for a set of two nonbinary trees. We also provide an implementation of all algorithms and an experimental analysis on their performance.


Introduction
Phylogenetics is the study of the evolutionary history of biological species. Traditionally such a history is represented by a phylogenetic tree. However, hybridization and horizontal gene transfer, both so-called reticulation events, can lead to multiple seemingly conflicting trees representing the evolution of different parts of the genome [12,14]. Directed acyclic networks can be used to combine these trees into a more complete representation of the history [1]. Reticulations are represented by vertices with in-degree greater than one.
Therefore, an important problem is how to construct such a network based on a set of input trees that are known to represent the evolutionary history for different parts of the genome. The network should display all of these input trees. In general there are many solutions to this problem, but in accordance with the parsimony principle we are especially interested in the most simple solutions to the problem. These are the solutions with a minimal number of reticulations. Finding a network for which the number of reticulations, also called the hybridization number, is minimal now becomes an optimization problem. This problem is NP-complete, even for only two binary input trees [3]. The problem is fixed parameter tractable for an arbitrary set of non-binary input trees if either the number of trees or the out-degree in the trees is bounded by a constant [16]. For a set of two binary input trees an FPT algorithm with a reasonable running time exists [2]. For more than two input trees theoretical FPT algorithms and practical heuristic algorithms exist, but no FPT algorithm with a reasonable running time is known. That is why we are interested in slightly modifying the problem to make it easier to solve.
One way to do this is by restricting the solution space to the class of tree-child networks, in which each non-leaf vertex has at least one outgoing arc that does not enter a reticulation [5]. The minimum hybridization number over all tree-child networks that display the input trees is called the tree-child hybridization number. These networks can be characterized by so-called cherry picking sequences [11]. This characterization can be used to create a fixed parameter tractable algorithm for this restricted version of the problem for any number of binary input trees with time complexity O((8k) k · poly(n, m)) where k is the tree-child hybridization number, n is the size of leaves and m is the number of input trees [15].
The solution space can be reduced even further [7], leading to the problem of finding the temporal hybridization number. The extra constraints enforce that each species can be placed at a certain point in time such that evolution events take a positive amount of time and that reticulation events can only happen between species that live at the same time. For the problem of computing the temporal hybridization number a cherry picking characterization exists too and it can be used to develop a fixed parameter tractable algorithm for problems with two binary input trees with time complexity O((7k) k · poly(n, m)) where k is the temporal hybridization number, n is the number of leaves and m is the number of input trees [7]. In this paper we introduce a faster algorithm for solving this problem in O(5 k · n · m) time using the cherry picking characterization. Moreover, this algorithm works for any number of binary input trees.
A disadvantage of the temporal restrictions is that in some cases no solution satisfying the restrictions exists. In fact determining whether such a solution exists is a NP-hard problem [8] [6]. Because of this our algorithm will not find a solution network for all problem instances. However we show that it is possible to find a network with a minimum number of non-temporal arcs, thereby finding a network that is 'as temporal as possible'. For that reason we also introduce an algorithm that also works for non-temporal instances. This algorithm is a combination of the algorithm for tree-child networks and the one for temporal networks introduced here.
In practical data sets, the trees for parts of the genome are often non-binary. This can be either due to simultaneous divergence events or, more commonly, due to uncertainty in the order of divergence events [9]. This means that many real-world datasets contain non-binary trees, so it is very useful to have algorithms that allow for non-binary input trees. While the general hybridization number problem is known to be FPT when either the number of trees or the out-degree of the trees is bounded by a constant [16], an FPT algorithm with a reasonable running time (O(6 k k! · poly(n))) is only known for an input of two trees [13]. Until recently no such algorithm was known for the temporal hybridization number problem however. In this paper the first FPT algorithm for constructing optimal temporal networks based on two non-binary input trees with running time O(6 k k! · k · n 2 ) is introduced.
We implemented and tested all new algorithms [4]. The structure of the paper is as follows. First we introduce some common theory and notation in Section 2. In Section 3 we present a new algorithm for the temporal hybridization number of binary trees, prove its correctness and analyse the running time. In Section 4 we combine the algorithm from Section 3 with the algorithm from [15] to obtain an algorithm for constructing tree-child networks with a minimum number of non-temporal arcs. In Section 5 we present the algorithm for the temporal hybridization number for two non-binary trees. In Section 6 we conduct an experimental analysis of the algorithms.

Trees
A rooted binary phylogenetic X-tree T is a rooted binary tree for which the leaf set is equal to X with |X| = n. Because we will mostly use rooted binary phylogenetic trees in this paper we will just refer to them as trees. Only in Section 5 trees that are not necessarily binary are mentioned, but we will explicitly call them non-binary trees. Each of the leaves of a tree is an element of X. We will also refer to the set of leaves in T as L (T ). For a tree T and a set of leaves A with the notation T \ A we refer to the tree obtained by removing all leaves that are in A from T and repeatedly contracting all vertices with both in-and out-degree one. Observe that (T \ {x}) \ {y} = T \ {x, y} = (T \ {y}) \ {x}. We will often use T to refer to a set of m trees T 1 , . . . , T m . We will write T \ A for {T 1 \ A, . . . , T m \ A} and L(T ) = ∪ m i=1 L(T i ).

Temporal networks
A network on X is a rooted acyclic directed graph satisfying: 1. The root ρ has in-degree 0 and an out-degree not equal to 1.
2. The leaves are the nodes with out-degree zero. The set of leaves is X.
3. The remaining vertices are tree vertices or hybridization vertices (a) A tree vertex has in-degree 1 and out-degree at least 2.
(b) A hybridization vertex (also called reticulation) has out-degree 1 and in-degree at least 2.
We will call the arcs ending in a hybridization vertex hybridization arcs. All other arcs are tree arcs. A network is a tree-child network if every tree vertex has at least one outgoing tree arc.
We say that a network N on X displays a set of trees T on X with X ⊆ X if every tree in T can be obtained by removing edges and vertices and contracting vertices with both in-degree 1 and out-degree 1. For a set of leaves A we define N \ A to be the network obtained from N by removing all leaves in A and afterwards removing all nodes with out-degree zero and contracting all nodes with both in-and out-degree one. (a) A temporal labeling is shown in the network above, asserting that the network is temporal.    Figure 3: No temporal network that displays these trees exists.
For a tree-child network N , the hybridization number h t (N ) is defined as where d − (v) is the in-degree of a vertex v and ρ is the root of N . A tree-child network N with set of vertices V is temporal if there exists a map t : V → R + , called a temporal labelling, such that for all u, v ∈ V we have t(u) = t(v) when (u, v) is a hybridization arc and t(u) < t(v) when (u, v) is a tree arc. In Fig. 2 both a temporal and a non-temporal network are shown.
For a set of trees T we define the minimum temporal-hybridization number as This definition leads to the following decision problem.

Temporal hybridization
Instance: A set of trees T and an integer k Question: Is h t (T ) ≤ k?
Note that there are sets of trees such that no temporal network exists that displays them. In Fig. 3 an example is given. For such a set T we have h t (T ) = ∞.

Cherry picking sequences
Temporal networks can now be characterized by so-called cherry-picking sequences [7]. A cherry is a set of children of a tree vertex that only has leaves as children. So for binary trees a cherry is a pair of leaves. We will write (a, b) ∈ T if {a, b} is a cherry of T and (a, b) ∈ T if there is a T ∈ T with (a, b) ∈ T . First we introduce some notation to make it easier to speak about cherries.
Definition 2.1. For a set of binary trees T on the same taxa define H(T ) to be the set of leaves that is in a cherry in every tree.
If two leaves are in a cherry together we call them neighbors. We also introduce notation to speak about the neighbors of a given leaf: Definition 2.3. For a set of binary trees T containing a leaf x define w T (x) = |N T (x)| − 1. We will also call this the weight of x in T .
Using this theory, we can now give the definition of cherry picking sequences.  For a cherry picking sequence s with s i = x we say that x is picked in s at index i. This has been proven in [7, Theorem 1, Theorem 2]. The proof works by constructing a cherry picking sequence from a temporal network and vice versa. Here, we only repeat the construction to aid the reader, and refer to [7] for the proof of correctness.
The construction of cherry picking sequence s from a temporal network N with temporal labeling t works in the following way: For i = 1 choose s i to be a leaf x of N such that t(p x ) is maximal where p x is the parent of x in N . Then increase i by one and again choose s i to be a leaf x of N \ {s 1 , . . . , s i−1 } that maximizes t(p x ) where p x is the parent of x in N \ {s 1 , . . . , s i−1 }. In [7, Theorem 1, Theorem 2] it is shown that now s is a cherry picking sequence with w T (s) = r(N ).
The construction of a temporal network N from a cherry picking s is somewhat more technical: for cherry picking sequence s 1 , . . . , s t , define N n to be the tree, only consisting of a root and leaf s n Now obtain N i from N i+1 by adding node s i and a new node p s i , adding edge (p s i , s i ) subdividing (p x , x) for every x ∈ N T \{s 1 ,...,s i−1 } (s i ) with node q x and adding an edge (q x , p s i ) and finally suppressing all nodes with in-and out-degree one. Then N = N 1 displays T and r(N ) = w T (s).
The theorem implies that the weight of a minimum weight CPS is equal to the temporal hybridization number of the trees. Because finding an optimal temporal reticulation network for a set of trees is an NP-hard problem [8], this implies that finding a minimum weight CPS is an NP-hard problem.  Figure 1, together with a constraint (b, d). Two elements x, y ∈ X are depicted as adjacent if x ∈ N T (y) i.e. if x and y appear in a cherry together. An arc from x to y indicates the presence of a constraint (x, y).
Definition 2.7. We call two sets of trees T and T equivalent if a bijection from L(T ) to L (T ) exists that transforms T into T . We call them equivalent because have the same structure and consequently the same (temporal-) hybridization number, however the biological interpretation can be different. We will write this as T T .

Algorithm for constructing temporal networks from binary trees
Finding a cherry picking sequence comes down to deciding in which order to pick the leaves. Our algorithm relies on the observation that this order does not always matter. Intuitively the observation is that the order of two leaves in a cherry picking sequence only matters if they appear in a cherry together somewhere during the execution of the sequence. Therefore the algorithm keeps track of the pairs of leaves for which the order of picking matters. We will make this more precise in the remainder of this section. The algorithm now works by branching on the choice of which element of a pair to pick first. These choices are stored in a so-called constraint set. Each call to the algorithm branches into subcalls with more constraints added to the constraint set. As soon as it is known that a certain leaf has to be picked before all of its neighbors and is in a cherry in all of the trees, the leaf can be picked. Intuitively, a cherry picking sequence satisfies a constraint set if for every pair (a, b) in the set a is picked with positive weight and (a, b) is a cherry just before picking a. This implies that a occurs in the cherry picking sequence before b.
We now prove a series of results about what sets of constraints are valid, which will then be used to guide our algorithm.
Observation 3.2. Let s be a cherry picking sequence for T and w T (x) > 0 and a, b ∈ N T (x). Then s satisfies one of the following constraint sets: Proof. Let i be the lowest index such that s i ∈ {x, a, b}. If s i = x, then (x, a) ∈ T \{s 1 , . . . , the algorithm. It is possible to implement an algorithm using only this rule, but the running time of the algorithm can be improved by using a second rule that branches into only two subproblems when it is applicable. The rule relies on the following observation. Note that we will write π i (C) for the set obtained by projecting every element of C to the i'th coordinate.
Observation 3.4. If C is satisfied by s then for all x ∈ π 1 (C) and y ∈ N T (x) we have that Using this observation we can let the algorithm branch into two paths by either adding (x, y) or (y, x) to the constraint set C if x ∈ π 1 (C). We define G(T, C) to be the set of cherries for which there is no constraint in C, so G(T, C) = {(x, y) : (x, y) ∈ T ∧ (x, y), (y, x) / ∈ C}. Observe that (x, y) ∈ G(T, C) is equivalent with (y, x) ∈ G(T, C).
Before proving the next result about constraints, we need the following lemma. This states that if we have a set of trees, a leaf that is in a cherry in all of the trees and a corresponding cherry picking sequence then the following holds: for every element in a cherry picking sequence, we can either move it to the front of the sequence without affecting the weight of the sequence or there is a neighbor of this element that occurs earlier in the sequence. Lemma 3.6. Let (s 1 , s 2 , . . .) be a cherry picking sequence for a set of trees T that satisfies constraint set C. Let x ∈ H(T ). Then at least one of the following statements is true: (1) ∃i : s i = x and s = (s i , s 1 , . . . , s i−1 , s i+1 , . . .) is a cherry picking sequence for T satisfying C and w(s) = w(s ).
(2) If s i = x then ∃j : s j ∈ N T (x) such that j < i.
Proof. Let r be the smallest number such that s r ∈ N T (x) ∪ {x}. In case s r = x it follows directly that condition (2) holds for j = r. For s r = x we will prove that condition (1) holds with i = r. The key idea is that, because s i is not in a cherry with any of s 1 , . . . , s i−1 , removing s i first will not have any effect on the cherries involving s 1 , . . . , s i−1 . More formally, take an arbitrary tree T ∈ T . Now take arbitrary j, k with s j = s k . Now we claim that for an arbitrary z we have (s j , z) ∈ T \ {s 1 , . . . , s j−1 } if and only if (s k , z) ∈ T \ {s 1 , . . . , s k−1 }. For s j = s 1 = s i = s k this is true because none of the elements s 1 , . . . , s i−1 are in N T (s i ) so for each z we have (s 1 , z) ∈ T if and only if (s i , z) ∈ T \ {s 1 , . . . , s i−1 }.
For k with k < i we have s j+1 = s j . Because For k > i we have j = k and also T \{s As soon as we know that a leaf in H(T ) has to be picked before all its neighbors we can pick it, as stated by the following lemma.
Then there is a cherry picking sequence s with s 1 = x and w(s ) = w(s).
Proof. This follows from Lemma 3.6, because statement (2) can not be true because for every j with s j ∈ N T (x) we have (x, s j ) ∈ C and therefore i < j for s i = x. So statement (1) has to hold which yields a sequence s with w(s) = w(s ) and s 1 = x.
The following lemma shows that we can also safely remove all leaves that are in a cherry with the same leaf in every tree.
Lemma 3.8. Let s be a cherry picking sequence for T satisfying constraint set C with x / ∈ π 1 (C) and x / ∈ π 2 (C). If x ∈ H(T ) and w T (x) = 0, then there is a cherry picking sequence s with s 1 = x and w(s ) = w(s) satisfying C.
Proof. Because w T (x) = 0 we have N T (x) = {y}. Then from Lemma 3.6 it follows that a sequence s exists such that either s = (x)|s or s = (y)|s is a cherry picking sequence for T and w T (s ) = w(s) and s satisfies C. However, because the position of x and y in the trees are equivalent (i.e. swapping x and y does not change T ) both are true.
We are almost ready to describe our algorithm. There is one final piece to introduce first: the measure P (C). This is a measure on a set of constraints C, which will be used to provide a termination condition for our algorithm. We show below that P (C) provides a lower bound on the weight of any cherry picking sequence satisfying C, and so if during any recursive call to the algorithm P (C) is greater than the desired weight, we may stop that call. Proof. For x = s i with i < n we prove that for We now present our algorithm, which we split into two parts. The main algorithm is CherryPicking, a recursive algorithm which takes as input parameters a set of trees T , a desired weight k and a set of constraints C, and returns a cherry picking sequence for T of weight at most k satisfying C, if one exists.
The second part is the procedure Pick. In this procedure zero-weight cherries and cherries for which all neighbors are contained in the constraint set are greedily removed from the trees.

Proof of correctness
In this section a proof of correctness will be given. First some properties of the auxiliary procedure Pick are proven. 1. If a cherry picking sequence s of weight at most k for T that satisfies C exists then a cherry picking sequence s of weight at most k for T that satisfies C exists.

Algorithm 2
1: procedure Pick(T , k , C ) 2: If s is a cherry picking sequence of weight at most k for T that satisfies C then p|s is a cherry picking sequence for T of weight at most k and satisfying C .
Proof. We will prove the first claim for ( We will prove this with induction on i. For i = 1 this is obvious because Now assume the claim is true for i = i . Now there are two cases to consider: that if a cherry picking sequence s satisfying C i exists then also a cherry picking sequence (x)|s that satisfies C exists with w(p|(x)|s ) = w(p|s). Note that this implies that s is a cherry picking sequence for . So this proves the statement for i = i +1.
Let j be the maximal value such x j is defined in a given invocation of Pick. We will prove the second claim for (T, k, C, p) = (T (i) , k i , C i , p (i) ) for all i = 0, . . . , j with induction on i. For i = 0 this is trivial. Now assume the claim is true for i = i and assume s is a cherry picking sequence for T (i +1) of weight at most k i +1 that satisfies C i +1 . Then if x i is defined, it will be in H(T (i ) ), so s = (x i )|s is a cherry picking sequence for T (i ) . Because s also satisfies C x , so s satisfies C i . Now it follows from the induction hypothesis that p i +1 |s = p i |s is a cherry picking sequence for T of weight at most k and satisfying C .
Note that on line 19 of Algorithm 1 an element b = a with (x, b) ∈ G(T , C ) is chosen. The following lemma states that such an element does indeed exist.  The proof of correctness of Algorithm 1 will be given in two parts. First, in we show that for any feasible problem instance the algorithm will return a sequence. Second, in we show that every sequence that the algorithm returns is a valid cherry picking sequence for the problem instance.
Lemma 3.14. When a cherry picking sequence of weight at most k that satisfies C exists, CherryPicking(T, k, C) from Algorithm 1 returns a non-empty set.
Proof. Let W (k, u) be the claim that if a cherry picking sequence s of weight at most k exists that satisfies constraint set C with n 2 −|C| ≤ u, then calling CherryPicking(T, k, C) will return a non-empty set. We will prove this claim with induction on k and n 2 − |C|.
For the base case k = 0 if a cherry picking sequence of weight k exists we must have that all trees are equal, so |L(T )| = 1. In this case a sequence is returned on line 7.
Note that we can never have a constraint set C with |C| > n 2 because C ⊆ L(T ) 2 . Therefore is true for all cases where 0 ≤ k < k b and all cases where k = k b and n 2 − |C| ≤ u. We consider the case where a cherry picking sequence s of weight at most k = k b + 1 exists for T that satisfies C and n 2 − |C| ≤ u + 1. Lemma 3.10 implies that k − P (C) ≥ 0, so the condition of the if-statement on line 2 will not be satisfied.
From Lemma 3.12 it follows that a CPS s of weight at most k exists for T that satisfies C . From the way the Pick works it follows that either k < k or n 2 − C = n 2 − C. If |L(T ) = 1 then {()} is returned and we have proven W (k b + 1, u + 1) to be true for this case. Because s satisfies C , we know that π 1 (C) ⊆ L(T ). We know there is an y ∈ N T (s 1 ) with (s 1 , y) / ∈ C , because otherwise s 1 would be picked by Pick. Also s satisfies C ∪ {(s 1 , y)}, which implies that k ≥ P (C ∪ {(s 1 , y)}) > P (C ), so the condition of the if-statement on line 10 will not be satisfied.
Note that we have (s 1 , x) ∈ G(T , C ), w T (s 1 ) > 0 and s 1 / ∈ π 2 (C ). This implies that either the body of the if-statement on line 15 or the body of the else-ifstatement on line 18 will be executed.
Suppose the former is true. By Observation 3.4 we know that s satisfies C ∪ {(x, y)} or C ∪ {(y, x)}. Because (x, y) ∈ G(T , C ) we know |C ∪ {x, y}| = |C ∪ {y, x}| = |C | + 1 and therefore n 2 − |C ∪ {x, y}| = n 2 − |C ∪ {y, x}| ≤ u. So by our induction hypothesis we know that at least one of the two subcalls will return a sequence, so the main call to the function will also return a sequence.
If instead the body of the else-if-statement on line line 18 is executed we know by Observation 3.2 that at least one of the constraint sets By the induction hypothesis it now follows that at least one of the three subcalls will return a sequence, so the main call to the function will also return a sequence. So for both cases we have proven W (k b + 1, u + 1) to be true. Lemma 3.15. Every element in the set returned by CherryPicking(T, k, C) from Algorithm 1 is a cherry picking sequence for T of weight at most k that satisfies C.
Proof. Consider a certain call to CherryPicking(T, k, C). Assume that the lemma holds for all subcalls to CherryPicking. We claim that during the execution every element that is in R is a partial cherry picking sequence for T of weight at most k that satisfies C . This is true because R starts as an empty set, so the claim is still true at that point. At each point in the function where sequences are added to R, these sequences are elements returned by CherryPicking(T , k , C ) with C ⊆ C . By our assumption we know that all of these elements are cherry picking sequences for T of weight at most k and satisfy C . The latter implies that every elements also satisfies C because C ⊆ C . The procedure now return {p|r : r ∈ R} and from Lemma 3.12 it follows that all elements of this set are cherry picking sequences for T of weight at most k and satisfying C.

Runtime analysis
The key idea behind our runtime analysis is that at each recursive call in Algorithm 3, the measure k − P (C) is decreased by a certain amount, and this leads to a bound on the number of times Algorithm 1 is called. It is straightforward to get a bound of O(9 k ). Indeed, it can be shown that for k < |C|/2 no feasible solution exists, and so the algorithm could stop whenever 2k − |C| < 0. One call to the algorithm results in at most 3 subcalls, and in each subcall |C| increases by at least one. Then the total number of subcalls to Algorithm 1 would be bounded by O(3 2k ) = O(9 k ). By more careful analysis, and using the lower bound of P (C) on the weight of a sequence satisfying C, we are able to improve this bound to O(5 k ).
We will now state some lemmas that are needed for the runtime analysis of the algorithm. We first show that the measure k − P (C) will never increase at any point in the algorithm. The only time this may happen is during Pick, as the values of k and C are not otherwise changed, except at the point of a recursive call where constraints are added to C (which cannot increase P (C)). Thus we first show that Pick cannot cause k − P (C) to increase. Proof. We will prove with induction that for the variables k i and C i defined in the function body, we have k i − P (C i ) ≤ k − P (C) for all i, from which the result follows. Note that for i = 0 this is trivial. Now suppose the inequality holds for i. Then we also have The next lemma will be used later to show that a recursive call to CherryPicking always increases k − P (C) b a certain amount. Proof. Suppose a ∈ π 1 (C ). Then (a, z) ∈ C for some z ∈ N T (x). If w T (a) > 0 then a satisfies the conditions in the if-statement on line 15, so line 19 would not be executed. If w T (a) = 0 then we must have |N T (a) \ {a}| = 1, so N T (a) \ {a} = {x}, which implies that z = x. But (a, x) / ∈ C because (x, a) ∈ G(T , C ), which contradicts that (a, z) ∈ C . So a / ∈ π 1 (C ). Because of symmetry, the same argument holds for b.
We now give the main runtime proof.  m). Let the runtime of CherryPicking(T ,k,C) be t(n, k, C). We will prove this with induction on k − P (C) that t(n, k, C) ≤ 5 k−P (C)+1 (k − P (C) + 1)f (n, m).
For −1 ≤ k − P (C) ≤ 0 the claim follows from the fact that the function will return on either line 3 or line 11 and therefore will not do any recursive calls. Now assume the claim holds for −1 ≤ k − P (C) ≤ w. Now consider an instance with k − P (C) ≤ w + ψ. Note that k − P (C ) ≤ k − P (C) (Lemma 3.16). If the function CherryPicking does any recursive calls then it either executes the body of the if-clause on line 15, or the body of the else-if clause on line 18.
If the former is true then the function does 2 recursive calls. Each recursive call to the function CherryPicking(T , k , C ) is done with a constraint set C for which |C | = |C | + 1. Therefore for both subproblems P (C ) ≥ P (C ) + ψ and also k − P (C ) ≤ k − P (C ) − ψ ≤ k − P (C) − ψ ≤ w. By our induction hypothesis the running time of each of the subcalls is now bounded by 5 k −P (C )+1 (k − P (C ) + 1)f (n, m). So therefore the total running time of this call is bounded by So in this case we have proven the claim for −1 ≤ k − P (C) ≤ w + ψ.
If instead the body of the else-if statement on line 18 is executed then 3 recursive subcalls are made. Consider the first subcall CherryPicking(T , k , C ). We have C = C ∪ {(a, x)}.
Because (x, a) ∈ G(T , C ) we have (a, x) / ∈ C . Therefore |C | = |C | + 1. By Lemma 3.17 we know that a / ∈ π 1 (C ), but we have a ∈ π 1 (C ), so |π 1 (C )| = |π 1 (C )| + 1. Therefore By our induction hypothesis we now know that the running time of this subcall is bounded by Note that by symmetry the same holds for the second subcall.
Theorem 3.19. CherryPicking(T, k, C) from Algorithm 1 returns a cherry picking sequence of weight at most k that satisfies C if and only if such a sequence exists. The algorithm terminates in O(5 k · poly(n, m)) time.
Proof. This follows directly from Lemma 3.15, Lemma 3.14 and Lemma 3.18.  (T) h tc (T) Figure 7: The difference between the tree-child reticulation number and the temporal reticulation number on the dataset generated in [15]. If no temporal network exists, the instance is shown under 'Not temporal'. Instances for which it could not be decided if they were temporal within 10 minutes (2.6% of the instances), are excluded.

Constructing non-temporal tree-child networks from binary trees
For every set of trees there exists a tree-child network that displays the trees. However there are sets of trees for which no temporal network displaying the trees exist, so we can not always find such a network. As shown in Fig. 7, approximately 5 percent of the instances used in [15] do not admit a temporal solution.
In this section we introduce theory that makes it possible to quantify how close a network is to being temporal. We can then pose the problem of finding the 'most' temporal network that displays a set of trees.  Note that network has a semi-temporal labeling.
Definition 4.2. For a tree-child network N with a semi-temporal labeling t, define d(N , t) to be number of hybridization arcs (u, v) with t(u) = t(v). We call these arcs non-temporal arcs. Call this number the temporal distance of N . Note that this number is finite for every network, because there always exist semi-temporal labelings.
The temporal distance is a way to quantify how close a network is to being temporal. The networks with temporal distance zero are the temporal networks. We can now state a more general version of the decision problem.
Semi-temporal hybridization Instance: A set of m trees T with n leaves and integers k, p. Question: Does there exist a tree-child network N with r(N ) ≤ k and d(N ) ≤ p?
There are other, possibly more biologically meaningful ways to define such a temporal distance. The reason for defining the temporal distance in this particular way is that an algorithm for solving the corresponding decision problem exists. For further research it could be interesting to explore if other definitions of temporal distance are more useful and whether the corresponding decision problems could be solved using similar techniques.
Van Iersel et al. presented an algorithm to solve the following decision problem in O((8k) k · poly(m, n)) time.
Tree-child hybridization Instance: A set of m trees T with n leaves and integer k. Question: Does there exist a tree-child network N with r(N ) ≤ k?
Notice that for p = k Semi-temporal hybridization is equivalent to Tree-child hybridization and for p = 0 it is equivalent to Temporal hybridization. The algorithm for Tree-child hybridization uses a characterization by Linz and Semple [11] using tree-child sequences, that we will describe in the next section. We describe a new algorithm that can be used to decide Semi-temporal hybridization. This algorithm is a combination of the algorithms for Tree-child hybridization and Temporal hybridization.

Tree-child sequences
First we will define the generalized cherry picking sequence (generalized CPS), which is called a cherry picking sequence in [15]. We call it generalized cherry picking sequence because it is a generalization of the cherry picking sequence we defined in Definition 2.4. For a tree T on X ⊆ X the sequence s defines a sequence of trees (T (0) , . . . , T (r) ) as follows: We will refer to T (r) as T (s), the tree obtained by applying sequence s to T .
A full generalized CPS on X is a generalized CPS for a set T of trees if for each T ∈ T the tree T (s) contains just one leaf and that leaf is in {x r+1 , . . . , x t }. The weight of a sequence s for a set of trees on X is defined as w T (s) = |s| − |X|.
A generalized CPS is a tree-child sequence if |s| ≤ r + 1 and y j = x i for all 1 ≤ i < j ≤ |s|. If for such a tree-child sequence |s| = r, then s is also called a tree-child sequence prefix.
It has been proven that a tree-child network displaying a set of trees T with r(N ) = k exists if and only if a tree-child sequence s with w(s) = k exists. The network can be efficiently computed from the corresponding sequence. The algorithm presented by Van Iersel et al. works by searching for such a sequence.
We will show that it is possible to combine their algorithm with the algorithm presented in Section 4. This yields an algorithm that decides Semi-temporal hybridization in O(5 k (8k) p · k · n · m) time.
Definition 4.5. Let s = ((x 1 , y 1 ), . . . , (x t , −)) be a full generalized CPS. An element (x i , y i ) is a non-temporal element when there are j, k ∈ [t] with i < j < k ≤ t and x j = x i and x k = x i .  The full proof of Lemma 4.7 is given in the appendix. We construct a tree-child network N from s in a similar way to [11, Proof of Theorem 2.2], working backwards through the sequence. At each stage when a pair (x, y) is processed, we adjust the network to ensure there is an arc from the parent of y to the parent of x. Our contribution is to also maintain a semi-temporal labeling t on N . This can done in such a way that for each pair (x, y), at most one new nontemporal arc is created, and only if (x, y) is a non-temporal element of s. This ensures that d(N , t) ≤ d(s). The full proof of Lemma 4.8 is given in the appendix. We construct the sequence in a similar way to [11,Lemma 3.4]. The key idea is that at any point the network will contain some pair of leaves x, y that either form a cherry (where x and y share a parent) or a reticulated cherry (where the parent of x is a reticulation, with an incoming edge from the parent of y). We process such a pair by appending (x, y) to s, deleting an edge from N , and simplifying the resulting network. By being careful about the order in which we process reticulated cherries, we can ensure that we only add a non-temporal element to s when we delete a non-temporal arc from N . This ensures that d(s) ≤ d(N , t). Observation 4.10. If a tree-child sequence s has a subsequence s that is a generalized cherry picking sequence for T , then s is also a generalized cherry picking sequence for T . Proof. Suppose this is not true. Because T (S) consists of a tree with only one leaf x r+1 , this implies that L((T \ {z})(s)) ⊆ L(T (s)). Let i be the smallest i for which we have that L((T \ {z})((x 1 , y 1 ), . . . , (x i , y i ))) ⊆ L(T ((x 1 , y 1 This implies that x i ∈ L((T \ {z}) ((x 1 , y 1 ), . . . , (x i , y i ))) but x i / ∈ L(T ((x 1 , y 1 y 1 ), . . . , (x i−1 , y i−1 ))\ {z}. Let p be the lowest vertex that is an ancestor of both x i and y i in the tree (T \ {z}) ((x 1 , y 1 ), . . . , (x i−1 , y i−1 )). Because x i and y i do not form a cherry in this tree, there is another leaf q that is reachable from p. Because q ∈ L(T ((x 1 , y 1 ), . . . , (x i−1 , y i−1 )) \ {z}), q is also reachable from the lowest common ancestor p in T ((x 1 , y 1 ), . . . , (x i−1 , y i−1 )) \ {z}, contradicting the fact that (x i , y i ) is a cherry in this tree.

Constraint sets
The new algorithm also uses constraint sets. However, because the algorithm searches for a generalized cherry picking sequence, we need to define what it means for such a sequence to satisfy a constraint set. Definition 4.12. A generalized cherry picking sequence s = ((x 1 , y 1 ), . . . , (x k , y k )) satisfies constraint set C if for every (a, b) ∈ C there is an i with (x i , y i ) = (a, b) and there is some j = i with x j = a.
In Definition 2.1 the function H(T ) was defined for sets of binary trees with the same leaves. After applying a tree-child sequence not all trees will necessarily have the same leaves. Because of this, we generalize the definition of H(T ) to sets of binary trees.   Proof. From Lemma 4.14 it follows that either (a, z) or (z, a) is in s and that either (b, z) or (z, b) is in s. Now let s i = (x i , y i ) be the element of these that appears first in s. Now we have three cases: 1. If x i = a, then s i = (a, z). Let T ∈ T be the tree in which (b, z) is a cherry. Now (b, z) ∈ T (s 1 , . . . , s i ). Because (s i+1 , . . . , s r+1 ) is a tree-child sequence for T (s 1 , . . . , s i ), this implies that there is some j > i with x j = a. Consequently {(a, z)} is satisfied by s.
2. If x i = b, then the same argument as in (1) can be applied to show that {(b, z)} is satisfied by s.
3. If x i = z, then we either have y i = a or y i = b. Without loss of generality we can assume y i = a. We still have (b, z) ∈ T (s 1 , . . . , s i ), which implies that there is some j > i with (x j , y j ) = (b, z) or (x j , y j ) = (z, b). Because j > i and s is tree-child, we know that y j = z.
We show that we have |S z | − 1 ≥ P (C x ). If |C z | = 0, then P (C z ) = 0 and the inequality is trivial. If |C z | = 1, then from the definition of constraint sets it follows that |S z | ≥ 2, Next we prove that if a leaf z is in H(T ) and appears in s with all of its neighbors, then we can move all elements containing z to the start of the sequence. Proof. We can write s = ((x 1 , y 1 ), . . . , (x r+1 , −)) = s a |s b where s a consists of the elements {s i : i ∈ I} and s b is s with the elements at indices in I removed. First we prove that s is a tree-child sequence. Suppose that s is not a tree-child sequence. Then there are i, j with i < j such that x i = y j . Note that we can not have that y j = z, because of how we constructed s . This implies that both indices i and j are in s b , implying that s b is not tree-child. But because s b is a subsequence of s this implies that s is not tree-child, which contradicts the conditions from the lemma. So s is tree-child.
We now prove that s fully reduces T . Because T (s a ) = T \ {z} from Lemma 4.11 it follows that s a |s is a generalized CPS for T . Because z / ∈ L(T (s a )), T (s a |s) = T (s a |s b ). So s is a generalized CPS for T .
Finally since for every non-temporal element in s the corresponding element in s is also non-temporal. We conclude that d(s ) ≤ d(s).

Trivial cherries
We will call a pair (a, b) a trivial cherry if there is a T ∈ T with a ∈ L(T ) and for every tree T ∈ T that contains a, we have (a, b) ∈ T . They are called trivial cherries because they can be picked without limiting the possibilities for the rest of the sequence, as stated in the following lemma.  1 , y 1 ), . . . , (x r+1 , −)) is a tree-child sequence for T of minimum length and (a, b) is a trivial cherry in T , then there is an i such that (x i , y i ) = (a, b) or (x i , y i ) = (b, a). Also, there exists a tree-child sequence s for T with |s| = |s |, d(s ) = d(s) and s 1 = (a, b).
Proof. This follows from Lemma 4.18.  (T, k, C, p). Then a tree-child sequence s of weight at most k for T that satisfies C exists if and only if a tree-child sequence s of weight at most k for T that satisfies C exists. In this case p|s is a tree-child sequence for T of weight at most k and satisfying C .  for (x, y) ∈ P do 27: if |{(x, z) ∈ C}| = 1 then 29: procedure Pick(T , k , C ) The proof for this lemma is the same as for Lemma 3.12, but uses Lemma 4.18 instead of Lemma 3.7. The following lemma was proven in [15,Lemma 11].   If ((x 1 , y 1 ), . . . , (x 2 , y 2 ), (x r+1 , −), , (x t , −)) is a full tree child-sequence of minimal length for T satisfying C and H(T ) \ π 2 (C) = ∅, then (x 1 , y 1 ) is a non-temporal element.
Proof. First observe that x 1 / ∈ π 2 (C) because the sequence satisfies C. Suppose (x 1 , y 1 ) is a temporal element. This implies that there is an i such that for all j < i we have x j = x 1 and x k = x 1 for all k ≥ i. This implies that for every T ∈ T there is a j < i such that x 1 is not in T ((x j , y j )). Consequently (x j , y j ) is a cherry in T . Because this holds for every tree T ∈ T we must have H(T ) \ π 2 (C), contradicting the assumption that H(T ) \ π 2 (C) = ∅.

The algorithm
We now present our algorithm for Semi-temporal hybridization. As with Tree-child hybridization, we split the algorithm into two parts: SemiTemporalCherryPicking(Algorithm 3) is the main recursive procedure, and Pick(Section 4.3) is the auxiliary procedure.
The key idea is that we try to follow the procedure for temporal sequences as much as possible. Algorithm 3 only differs from Algorithm 1 in the case where neither of the recursion conditions of Algorithm 1 apply, but there are still cherries to be processed. In this case, we can show that there are no trivial cherries, and hence Lemma 4.21 applies. Then we may assume there are at most 4k * unique cherries, where k * is the original value of k that we started with. In this case, we branch on adding (x, y) or (y, x) to the sequence, for any x and y that form a cherry. Any such pair will necessarily be a non-temporal element, and so we decrease p by 1 in this case. A full proof of the following lemma is given in the appendix.
Lemma 4.23. Let s be a tree-child sequence prefix, T a set of trees with the same leaves and define T := T (s). Suppose k, p ∈ N and C ∈ L(T ) 2 . When a generalized cherry picking sequence s exists that satisfies C and such that s |s is a tree-child sequence for T with w T (s |s) ≤ k and d(s) ≤ p exists, SemiTemporalCherryPicking(T, k, k , p, C) from Algorithm 3 returns a non-empty set. Lemma 4.24. Let s be a tree-child sequence prefix, T a set of trees with the same leaves and define T := T (s). Suppose k, p ∈ N and C ∈ L(T ) 2 . If S is returned by a call to SemiTemporalCherryPicking(T, k, k , p, C), then for every s ∈ S, the sequence s = s |s is a tree-child sequence for T with d(s) ≤ p and w(s) ≤ k.
The proof of this lemma is similar to the proof of Lemma 3.14 using Lemma 4.20.
Proof. This can be proven by combining the proofs from Lemma 3.18 and [15,Lemma 11].
Theorem 4.26. SemiTemporalCherryPicking(T, k, k, p, ∅) from Algorithm 3 returns a cherry picking sequence of weight at most k if and only if such a sequence exists. The algorithm terminates in O(5 k · (8k) p · k · n · m) time.
Proof. This follows directly from Lemma 4.24, Lemma 4.23 and Lemma 4.25.

Constructing temporal networks from two non-binary trees
The algorithms described in the previous sections only work when all input trees are binary. In this section we introduce the first algorithm for constructing a minimum temporal hybridization number for a set of two non-binary input trees. The algorithm is based on [13] and has time complexity O(6 k k! · k · n 2 ).
We say that a binary tree T is a refinement of a non-binary tree T when T can be obtained from T by contracting some of the edges. Now we say that a network N displays a non-binary tree T if there exists a binary refinement T of T such that both N displays T . Now the hybridization number h t (T ) can be defined for a set of non-binary trees T like in the binary case. Note that computing the minimum size of a neighbor cover is a NP-hard problem itself. However if |T | is constant the problem can be solved in polynomial time. Note that for binary trees this definition is equivalent to the definition given in Definition 2.3.
Next Definition 2.1 is generalized to non-binary trees. Proof. Note that this is a generalization of Theorem 2.6 to the case of non-binary input trees and the proof is essentially the same. A cherry picking sequence with weight k can be constructed from a temporal network with reticulation number k in the same way as in the proof of Theorem 2.6. The construction of a temporal network N from a cherry picking s is also very similar to the binary case: for cherry picking sequence s 1 , . . . , s t , define N t+1 to be the network, only consisting of a root, the only leaf of T \ {s 1 , . . . , s t } and an edge between the two. For each i let S i be a minimal neighbor cover of s i in T \ {s 1 , . . . , s i−1 }. Now obtain N i from N i+1 by adding node s i , subdividing (p x , x) for every x ∈ S i with node q x and adding an edge (q x , s i ) and finally suppressing all nodes with in-and out-degree one. It can be shown that r(N ) = w T (s). The algorithm relies on some theory from [13], that we will introduce first. Proof. Let p be the first index that a cherry is reduced in s. Let (a, b) be one of the cherries that is reduced at index p. Now there will be a cherry in T that contains both a and b. Let C be one of the minimum clusters that is contained in this cherry. Let x be the element of C that occurs last in s. Now let c 1 , . . . , c t be the elements from C \ {x} ordered by their index in s. Now we claim that for any permutation σ of [t] we have s = (c σ(1) , . . . , c σ(t) )|(s \ (C \ {x})) is a cherry picking sequence for T and w T (s ) ≤ w T (s).
Let i be the index of the last element of C \ {x} in s. Suppose that s is not a CPS for T . Let j be the smallest index for which s j / ∈ H(T \ {s 1 , . . . , s j−1 }). Let T ∈ T be such that s j is not in a cherry in T \ {s 1 , . . . , s j−1 }. Choose k such that s k = s j . Now there are three cases: • Suppose j > i, then k = j and {s 1 , . . . , s k } = {s 1 , . . . , s j }. This implies that s j ∈ H(T \ {s 1 , . . . , s j }), which contradicts our assumption.
• Otherwise, suppose s j ∈ {c 1 , . . . , c t }. Then j ≤ t. Now s k has to be in a cherry in T \ {s 1 , . . . , s k−1 }. Because no cherries are reduced before index i in s this means that s j is in a cherry in T . Because no cherries are reduced in s before index t, this implies that the same cherry is still in T \ {s 1 , . . . , s j−1 }, which contradicts our assumption.
• Otherwise we must have j ≤ i. Because no cherries are reduced before index i in s this means that s j is in a cherry Q in T . If this cherry contains a leaf y with s w = y for w > j, then s j is still in a cherry in T \ {s 1 , . . . , s j−1 }, contradicting our assumption, so this can not be true. However, that implies that the neighbors of s k in T \ {s 1 , . . . , s k−1 } are all elements of {c 1 , . . . , c t }.
Let v be the second largest number such that c v is one of these neighbors. Let q be the index of c v in s. Now cherry Q will be reduced by s at index max(q, j) < i, which contradicts the fact that C is contained in a cherry of T that is reduced first by s.
Now to prove that w T (s ) ≤ w T (s), we will prove that for s j = s k we have Note that for j ≥ i this is trivial, so assume j < i. If s j ∈ C \ {x}, then w T \{s 1 ,...,s j−1 } (s j ) ≥ w T (s j ) because no cherries are reduced before i, which implies that no new elements added to cherries before i. For the same reason we must have s j ∈ H(T ). Because there are no x ∈ H(T ) with w T (x) = 0 we must have w T (s j ) = 1. So w T \{s 1 ,...,s k−1 } (s k ) ≤ w T \{s 1 ,...,s j−1 } (s j ) = 1.

Bounding the number of minimal clusters
By Lemma 5.7 in the construction of a cherry picking sequence we can restrict ourselves to only appending elements from minimal clusters. We use the following theory from [13] to bound the number of minimal clusters.  Proof. Let C be a minimal cluster of T . Let x be an element of C that is maximal in C with respect to the partial ordering ' T − →' (if we say that x T − → y means that y is 'greater than or equal to' y). Now suppose that x is not a terminal. Then there is an y such that x T − → y. However then y ∈ C, but this contradicts the fact that x is a maximal element in C with respect to ' Because this is a contradiction, x has to be a terminal.
Lemma 5.11. Let T be a set of trees with h t (T ) ≥ 1 containing no zero-weight leaves. Let N be a network that displays T . Then T contains at most 2r(N ) terminals that are not directly below a reticulation node.
Proof. We reformulate the proof from [13, Lemma 3]. We use the fact that for each terminal one of the following conditions holds: the parent p x of x in N is a reticulation (condition 1) or a reticulation is reachable in a directed tree-path from the parent p x of x (condition 2). This is always true because if neither of the conditions holds, because otherwise another leaf y is reachable from p x , implying that x T − → y, which contradicts that x is a terminal. Let R be the set of reticulation nodes in N and let W be the set of terminals in T that are not directly beneath a reticulation. We describe a mapping F : W → R such that each reticulation r is mapped to at most d − (r) times. Note that for each x ∈ W condition 2 holds. For these elements let F (x) = y where y is a reticulation reachable from p(x) by a tree-path.
Note that there can not be a path from p(x) to y containing only tree arcs when x = y are both in H(T ) because then x → y which contradicts that x is a terminal. It follows that each reticulation r can be mapped to at most d − (r) times: at most once incoming edge. Then for the set of terminals Ω we have |Ω| ≤ r∈R d − (r) ≤ r∈R (1 + (d − (r) − 1)) ≤ |R| + k ≤ 2k.
Lemma 5.12. Let T be a set of nonbinary trees such that h t (T ) ≥ 1. Then any set S of terminals in T with |S| ≥ 2h t (T ) + 1 contains at least one element x ∈ H(T ) such that s is a cherry picking sequence for T with w T (s) = h t (T ) and s 1 = x.
Proof. Let N be a temporal network that displays T such that r(N ) = h t (T ) with corresponding cherry picking sequence s. From the Lemma 5.11 it follows that at most r(N ) terminals exist in T that are not directly below a reticulation. So there is an x ∈ S that is directly below a reticulation. Now let T be the set of all binary trees displayed by N . Note that s is a cherry picking sequence for T . Let i be such that s i = x. Because x is directly below a reticulation in N , we have s j / ∈ N T (x), which implies by Lemma 3.6 that s = (s i , s 1 , . . . , s i−1 , s i+1 , . . .) is a cherry picking sequence for

Run-time analysis
Lemma 5.13. The running time of CherryPicking(T, k) from Algorithm 5 is O(6 k k! · k · n 2 ) if T is a set consisting of two nonbinary trees.
Proof. Let f (n) be an upper bound for the running time of the non-recursive part of the function. We claim that the maximum running time t(n, k) for running the algorithm on trees with n leaves and parameter k is bounded by 6 k k!kf (n).
For k = 0 it is clear that this claim holds. Now we will prove that it holds for any call, by assuming that the bound holds for all subcalls.
If |S| > 2k, then the algorithm branches into 2k + 1 subcalls. The total running time can then be bounded by (2k + 1)t(n, k − 1) + f (n) ≤ (2k + 1)6 k−1 (k − 1)!(k − 1)f (n) + f (n) If the condition of the if-statement on line 23 is true, then for that q the functions does 3 subcalls with k reduced by one. So the recursive part of the total running time for this q is bounded by If the condition on line 23 holds then there is at most one d ∈ D with |d| ≤ 2. Using this information we can bound the total running time of the subcalls that are done for q in the else clause by (1) S ← set of terminals in T 15: if |S| > 2k then 16: S ←subset of S of size 2k + 1 17: end for  return {s|x : x ∈ R} 38: end procedure Note that (2) follows from the fact that x → x6 k−x+1 is a decreasing function for x ∈ [1, ∞). So for each q the running time of the subcalls is bounded by (k − 1)!(k − 1)f (n)2 k−1 3 k . Now the total running time is bounded by Because the non-recursive part of the function can be implemented to run in O(n 2 ) time the total running time of the function is O(6 k k! · k · n 2 ).
Lemma 5.14. Let T be a set of non-binary trees. If h t (T ) ≤ k, then CherryPicking(T, k) from Algorithm 5 returns a cherry picking sequence for T of weight at most k.
Proof. First we will prove with induction on k that if h t (T ) ≤ k then a sequence is returned. For k = 0 it is true because if h t (T ) = 0, as long as L(T ) > 1 then |H(T )| > 0 and all elements of H(T ) will have zero weight, so they are removed on line 4. After that L(T ) = 1 so an empty sequence will be returned, which proves that the claim is true for k = 0. Now assume that the claim holds for for k < k and assume that h t (T ) ≤ k . Now we will prove that a sequence is returned by CherryPicking(T, k) in this case. After removing an element x with weight zero on line 4 we still have h t (T ) ≤ k (Lemma 5.6). If |L(T )| = 1, an empty sequence is returned. If this is not the case then 0 < h t (T ) ≤ k, so the else if is not executed.
If |S| > 2k then from Lemma 5.12 it follows that for S ⊆ S with |S | = 2k + 1 there is at least one x ∈ S such that h t (T \ {x}) ≤ k − 1. Now from the induction hypothesis it follows that CherryPicking(T \ {x}, k ) returns at least one sequence, which implies that R is not empty. Because of that the main call will return at least one sequence, which proves that the claim holds for k = k .
The only thing left to prove is that every returned sequence is a cherry picking sequence for T . This follows from the fact that only elements from H(T ) are appended to s and that R consists of cherry picking sequences for T \ {s 1 , . . . , s t }.

Experimental results
We developed implementations of Algorithm 1, Algorithm 5 and Algorithm 3, which are freely available [4]. To analyse the performance of the algorithms we made use of dataset generated in [15] for experiments with an algorithm for construction of tree-child networks with a minimal hybridization number.

Algorithm 1
In Fig. 8 the running time of Algorithm 1 on the dataset from [15] is shown. The results are consistent with the bound on the running time that was proven in Section 3. Also, the algorithm is able to compute solutions for relatively high values of k, indicating that the algorithm performs well in practice.
The authors of [15] also provide an implementation of their algorithm for tree-child algorithms. The implementation contains several optimizations to improve the running time. One of them is an operation called cluster reduction [10]. The implementation is also multi-threaded. In Fig. 9 we provide a comparison of the running times of the tree-child algorithm with Algorithm 1. In this comparison we let both implementations use a single thread, because our   Running time tree-child -Running time temporal (s) Figure 9: Difference between the running time of Algorithm 1 and the algorithm for tree-child networks from [15].
implementation of the algorithm for computing the hybridization number does not support multithreading. The implementation could however be modified to solve different subproblems in different threads which will probably also result in a significant speed-up. In Algorithm 1 we see that the difference in time complexity between the O((8k) k ) algorithm and the O(5 k ) algorithm is also observable in practice.

Algorithm 5
We used the software from [15] to generate random binary problem instances and afterwards randomly contracted edges in the trees to obtain non-binary problem instances. We used this dataset to test the running time of Algorithm 5. The results are shown in Fig. 10. We see that the algorithm is usable in practice and has a reasonable running time.    Figure 11: Difference between the running time of Algorithm 3 and the algorithm for constructing tree-child networks from [15] on all non-temporal instances in the dataset from [15].

Algorithm 3
Algorithm 3 was tested on all non-temporal instances in the dataset from [15]. In Fig. 11 the running time of Algorithm 3 is compared to that of the algorithm from [15]. The data show that the algorithm from [15] is often faster than Algorithm 3. However, there also also instances for which Algorithm 3 is much faster. Hence, in practice it can be worthwile to run this algorithm on instances that cannot be solved by the algorithm from [15] in a reasonable time. It should also be noted that we only tested the algorithms on a relatively small dataset.

A Omitted proofs
Lemma 4.7. Let s be a full tree-child sequence s for T . Then there exists a network N with semi-temporal labeling t such that r(N ) ≤ w T (s) and d(N , t) ≤ d(s).
Proof. This can be proven by constructing a tree-child network from the tree-child sequence as described in [11, Proof of Theorem 2.2]. We will show that a semi-temporal labeling satisfying our constraints exists for the resulting network. We will write s = (x 1 , y 1 ), . . . , (x r , y r ), (x r+1 , −) Now we merge all consecutive elements (x i , y i ), (x i+1 , y i+1 ), . . . , (x i+j , y i+j ) for which x i = x i+1 = · · · = x i+j into one element (x i , {y i , y i+1 , . . . , y i+j }) and call the resulting sequence s . Call an element of this sequence temporal if all corresponding elements in s are temporal.
Call it non-temporal if all corresponding elements in s are non-temporal. Observe that it can not happen that some of the corresponding elements are temporal while some are non-temporal.