A distance-based model for convergent evolution

Convergent evolution is an important process in which independent species evolve similar features usually over a long period of time. It occurs with many different species across the tree of life, and is often caused by the fact that species have to adapt to similar environmental niches. In this paper, we introduce and study properties of a distance-based model for convergent evolution in which we assume that two ancestral species converge for a certain period of time within a collection of species that have otherwise evolved according to an evolutionary clock. Under these assumptions it follows that we obtain a distance on the collection that is a modification of an ultrametric distance arising from an equidistant phylogenetic tree. As well as characterising when this modified distance is a tree metric, we give conditions in terms of the model’s parameters for when it is still possible to recover the underlying tree and also its height, even in case the modified distance is not a tree metric.


Introduction
One of the central questions in the area of phylogenetics is to develop models and algorithms to reconstruct the evolutionary history of a set of species (or taxa) in terms of a phylogenetic tree (Felsenstein 2004).Typically, this is an edge-weighted, rooted tree whose leaves correspond to the set of species in question.A key evolutionary assumption that underpins most phylogenetic models and phylogenetic tree reconstruction methods is that once a speciation has occurred the child species are then conditionally independent and that they diverge from each other at a fairly constant rate.This assumption implies that pairs of species with an older least common ancestor will be at a greater evolutionary distance from each other-in terms of either their molecular sequences or morphology-than those with a more recent least common ancestor (Zuckerkandl and Pauling 1965).
Even so, there are several biological processes which do not conform to this general rule.For example, in virus evolution, recombination events might introduce large chunks of genetic material from one virus into another (Pérez-Losada et al. 2015), bacterial species are known to exchange genes via horizontal gene transfer Dagan and Martin (2006), and both plant and animal species can exchange genetic material through processes such as hybridisation and introgression (Mallet 2005).In extreme cases this latter process is sufficient to cause "reverse speciation" (Rudman and Schluter 2016), a process that is of increasing interest in conservation genetics (Bohling 2016;Seehausen 2006).These types of evolution are commonly known as reticulate evolution and there is a growing body of literature concerning the use of a generalisation of phylogenetic trees called (rooted) phylogenetic networks to model and represent such evolutionary scenarios (Bapteste et al. 2013).
Another important evolutionary process which may also cause species to become more alike even in cases where they are not more genetically similar is known as convergent evolution.It occurs with many different species across the tree of life, and, in contrast to reticulate evolution, genetic material is not exchanged (see e.g.Sackton and Clark (2019) for a recent review).Convergent evolution usually acts over long periods of time and there has been less attention on modelling this gradual process.
In Sumner et al. (2012), a very general Markov model of character evolution was proposed that allowed species to either diverge or converge according to different partitions of the species set in different epochs.These convergence-divergence models were explored further in Mitchell (2016) and Mitchell et al. (2018) where questions of identifiability and distinguishability of different subclasses of model were addressed (Mitchell 2016;Mitchell et al. 2018).Perhaps unsurprisingly, they found that not all convergence-divergence models could be distinguished from each other or from the standard tree model.
Despite these advances, to our best knowledge no-one has investigated models allowing convergence of species from the perspective of distance data.Evolutionary distances are commonly inferred from species data, for example, from morphological features or molecular sequences, and there are several approaches to reconstruct evolutionary trees from such distances [e.g.(Felsenstein, 2004, Chapter 11)].Here we are interested in modelling the situation where convergence between species or lineages has occurred over a sustained period and acts so as to reduce the evolutionary distance The points r , s represent two ancestral species which have converged to give rise to the species r , s , respectively, over a period of time that is proportional to the length of the two bold paths between species or at least slows down their rate of divergence.In particular, we seek to understand when such processes will leave a discernible trace in distance data, and to characterise the situations where the underlying tree topology is recoverable.
An illustration of our distance-based convergence model is presented in Fig. 1 (see Sect. 3 for precise definitions).We start with an edge-weighted, rooted phylogenetic tree T on a collection X of species in which the length of any path from the root ρ T of T to any leaf has the same length (i.e.T is an equidistant tree (Semple and Steel 2003)).Such trees are commonly used to represent the evolutionary history of a collection of species that have undergone "clock-like" evolution.
Under this assumption, the evolutionary distance d T (x, y) between any pair of species x, y ∈ X is given by taking the length of the shortest path between x and y in T , so that d T (x, y) is proportional to twice the time that has passed since the last common ancestor of x and y speciated.To model convergence, we assume that at some time in the past two ancestral species (or lineages), represented by the two points r , s in T at the same distance to ρ T , have been subject to convergence for a certain period of time.We represent this by two equal-length and disjoint paths in T (represented in bold) that start at r and s and end at two points r and s , respectively.In particular, we are also assuming that r and s will diverge from one another after the convergence period has ended.
Using the information given by this model we adjust the distance d T on X to obtain a new distance d T on X as follows.For any x, y ∈ X that lie below the points r and s we subtract a certain amount off of the distance d T (x, y) that is proportional to the period of time that the ancestors of x and y have undergone convergence as determined by the two disjoint paths.Note that the distance d T should be thought of as being a distance that is inferred directly from the set of species X .These distances could, for example, be given by computing some distance between molecular sequences representing species in X , morphological data, or broader genomic features such as gene presence/absence.The mathematical aim is to then understand when we can recover T from d T .Interestingly, as we shall see, for our model of convergence even though d T will no longer necessarily correspond to an equidistant tree, it may still be possible to recover the topology of T from d T , even in case d T does not correspond to any tree.
We now summarise the contents of the rest of this paper.In Sect.2, we present some basic definitions and facts concerning phylogenetic trees, ultrametrics and tree metrics.In Sect.3, we present our model of convergence and characterise when the map d T above is in fact a distance (Lemma 1).In Sect.4, we then focus on the question of how to recover the topology of T from d T .To do this we consider three-leaved subtrees of T called triplets, and characterise when d T is triplet respecting, that is, when it is possible to recover the triplets of T (and hence the topology of T ) from d T (Theorem 3).In Sect.5, under the assumption that d T is triplet-respecting, we characterise when d T is either an ultrametric or a tree metric (Theorem 8).In Sect.6, we then focus in when it is possible to also recover the height of T from d T in case we are able to recover T from d T (Theorem 11).We conclude in Sect.7 with a brief discussion of possible future directions.

Preliminaries
In this paper, X is a finite set (of species or taxa) and n = |X |.We assume n ≥ 3. A • a tree metric if it satisfies the four-point condition, i. e. for all (not necessarily distinct) x, y, u, v ∈ X , (1) Note that an ultrametric is a tree metric, a tree metric is a metric, and that there are tree metrics that are not ultrametrics, metrics that are not tree metrics and distances that are not metrics.Also note that if d is a metric on X , then d is a tree metric if and only if d satisfies the 4-point condition for every pairwise distinct x, y, u, v ∈ X .
A (binary) phylogenetic tree T (on X ) is a rooted tree, with root ρ = ρ T and leaf-set X such that the degree of ρ is two and the degree of any other non-leaf vertex in T is three.An edge-weighted phylogenetic tree T = (T , w) is a phylogenetic tree T = (V , E) together with a weight function w : E → R >0 , which assigns a positive weight to each edge in T .To a phylogenetic tree T on X , we associate the distance d T = d (T ,w) on X given by, for x, y ∈ X , setting d T (x, y) equal to the length of the path in T between x and y (i.e. the sum of the edge-weights taken over the edges in the path in T between x and y).Note that d T is necessarily a tree metric.
We shall also consider an edge-weighted phylogenetic tree T = (T , w) as being a continuous object, that is, we consider an edge e of T with weight w(e) as being a real, closed interval with length w(e).In particular, we will consider a point in T to be an element in some edge of T .Note that vertices in T are considered as points in T , and that we will use the terms vertex and point interchangeably when it is clear what we mean from the context.When we want to emphasise that a point is not a vertex we shall say that it is inside of an edge.Note that we have a natural ordering of the points in T .Given two points a, b in T , we say that a is above b (or b is below a) if either a = b or the path from the root of T to b (thought of as a continuous object) contains a; if b is below a and not equal to a we say that b is strictly below a.Moreover, we define the least common ancestor lca(a, b) = lca T (a, b) of a and b to be the lowest point in T that is above both a and b.Note that lca(a, a) = a.
An equidistant tree is an edge-weighted phylogenetic tree T = (T , w) such that for any two leaves x and y of T the length of the path in T from the root ρ of T to x equals the length of the path in T from ρ to y.Given such a tree T , the height h(a) = h T (a) of a point a of T is the length of the path in T from a to any leaf below a.We refer to the height of the root of T as the height of T .We call an equidistant tree generic if for any pair v, w of distinct non-leaf vertices in T , h(v) = h(w).

Convergence scenarios
In this section, we formally define our convergence model which is based on the following parameters: (1) a real non-negative number ≥ 0, (2) a generic equidistant tree T = (T , w) on X with height h > 0, (3) two non-negative numbers α and β with 0 < α < β < h, and α, β not equal to the height of any vertex in T , and (4) a convergence set R in T , that is, a set of four distinct points say r , r , s, s , each one of them inside some edge of T , such r , s have height β, the points r , s have height α, the point r is below r , and the point s is below s.We call a triple (T , R, ) consisting of some choice of these parameters a convergence scenario (on X ) (see Fig. 2 for an example).From now on, T = (T , w), α, β, and will be as described above.
We now define some additional terminology for convergence scenarios.For such a scenario (T , R, ) we call the points in R with height β the top points of R and the points with height α the bottom points of R. We define lca(R) = lca T (R) to be the least common ancestor of the top points of R (which is necessarily a vertex in T ).We say that two distinct elements x, y in X are (strictly below) below R if x is below one (bottom) top point of R and y is below the other (bottom) top point of R. In addition, given two elements x, y in X below R, we define h R (x, y) = max{h(lca T (r , x)), h(lca T (s , y))}, where r (respectively s ) is the bottom element in R, so that x and r (respectively s and y) are both below the same top point in R. In addition, we associate a map d = d (T ,R) : X × X → R to (T , R, ) as follows.Let x, y ∈ X .If x, y are below R (so that they are necessarily distinct), then set To help illustrate these concepts, consider the convergence scenario pictured in Fig. 2. Then t and z are below R, but not strictly below R, whereas x and y are strictly below R. Furthermore, h(lca Since y and z are not both below R, we obtain d (y, z) = d T (y, z) = 3.
Note that d ≤ d T , d is symmetric, and that d may take on negative values [i.e., it is what is commonly called a dissimilarity map (Semple and Steel 2003)].Loosely speaking, we can interpret the map d as follows.For any two taxa x and y, the quantity d T (x, y) is proportional to the time that x and y have diverged from one another.We then subtract 2 (β − h R (x, y)) from this quantity to model the fact that some ancestors of x and y have converged for a period of time that is proportional to the quantity β − h R (x, y).
In general, given d , we are interested in recovering the topology of the phylogenetic tree T that gives rise to d .Since in real applications d will be non-negative (e.g., it could be a distance matrix computed from a multiple sequence alignment), in rest of this paper we shall focus on the case where d is a distance.We conclude this section by giving a characterisation for when this is the case.

Lemma 1 Let (T , R, ) be a convergence scenario. Then d is a distance if and only if
. Since d T is a distance on X , to show that d is a distance it clearly suffices to show that d (x, y) > 0 for all x, y ∈ X below R. Let x, y ∈ X be below R. Then h R (x, y) ≥ α, and so Conversely, if d is a distance, then pick some x, y ∈ X distinct and strictly below R.
Note that there exist convergence scenarios (T , R, ) for which d is a distance but not a metric (see for example Sect. 5,Fig. 4(8) where and = 5).There does not appear to be a simple characterisation along the lines of Lemma 1 for when d is a metric, although in Sect. 5 we shall give a characterisation for when d is a metric in case d enjoys some additional properties.

Recovering the topology of the tree from a convergence scenario
In this section, given a convergence scenario (T , R, ) with d = d (T ,R) a distance, we are interested in understanding when we can recover the topology of T from d .
To this end, we begin by recalling some useful facts concerning phylogenetic trees.A triplet is a phylogenetic tree with three leaves.If x, y, z are the leaves of a triplet, and the least common ancestor of x and y in the triplet is not the root of the triplet, then we denote the triplet by ((x, y), z).Given a phylogenetic tree T with leaf set X , we can induce a triplet on every subset of X of size three by simply taking the tree spanned by the leaves in this subset and suppressing all non-root vertices contained in precisely two edges.We let R(T ) denote the set of triplets on X induced by T in this way.Note that R(T ) completely determines T (Semple and Steel 2003, Theorem 6.4.1) (i.e.there is no phylogenetic tree T on X different from T with R(T ) = R(T )).
Now, for the convergence scenario (T , R, ), we associate a set of triplets to d by putting In case = 0, we have s * s * Fig. 3 The configurations used within the proof of Theorem 3 can recover T from R(d ) using, for example, the Build algorithm (Semple and Steel 2003).
Interestingly, there are convergence scenarios (T , R, ) where d is a distance that is not a tree metric (and therefore not an ultrametric), but where d is still triplet respecting (e.g. in Fig. 2 take any with 0 < < 1 3 ).Thus in some cases we can recover the tree T from d even though d is not a tree metric.Hence, it is of interest to characterise when d is a triplet respecting distance.
To this end, we begin with a useful but somewhat technical observation.Given a convergence scenario (T , R, ) on X , we associate a convergence scenario to each triple Y = {x, y, z} ⊆ X , for which ((x, y) Let Q denote the edge-weighted triplet ((x, y), z) whose edge weighting is induced by the edge-weighting of T (so that Q is an edge-weighted phylogenetic tree on Y ).Note that since d | Y = d T | Y , at least two elements in Y must be below R, and so the top points in R are contained in Q.Even so, the bottom points of R will not necessarily be contained in Q.However, by interchanging the roles of x and y if necessary, we obtain a triple (Q, R * = {r , s, r * , s * }, ) as pictured in one of Fig. 3i-iv by giving labels r and s to the top points of R and giving the points r * and s * height equal to max{h r , h s , α}, where h r and h s are the heights of the points where the paths from r and s to lca(R) join Q, respectively.We refer to (Q, R * , ) as the restriction of (T , R, ) to Y .For example, for the triplet ((x, y), z) coming from the phylogenetic tree in Fig. 1, we would obtain the convergence scenario as in Fig. 3ii, where r * and s * have height equal to the height of the vertex v in Fig. 1 (since r * = v, s * = s and v is higher than s ).
The proof of the following is routine case checking, and so we omit it.
Lemma 2 Let (T , R, ) be a convergence scenario on X and let Y = {x, y, z} be a triple of distinct elements in X such that ((x, y) We call a convergence scenario (T , R, ) a cherry scenario if the points in R are all contained in two edges of some cherry in T , that is, two leaves in T that are adjacent to a common vertex.
Theorem 3 Suppose that (T , R, ) is a convergence scenario on X such that d is a distance.If (T , R, ) is a cherry scenario, then d is triplet respecting, else d is triplet respecting if and only if for all distinct x, z ∈ X strictly below R and y ∈ X such that ((x, y) which implies that Inequality (3) holds.
Conversely, suppose that ((x, y), z) ∈ R(T ) that x, z are strictly below R and that Inequality (3) holds.We will show that d (x, y) Then consider the convergence scenario (Q, R * , ) that is obtained by restricting T to Y which, without loss of generality, must be as in one of the configurations in Fig. 3i-iv.By Lemma 2 d | Y = d * , and so considering each of the cases (i)-(iv) in Fig. 3 we have d (x, y) < min{d (x, z), d (y, z)} since in (i) by ( 3) As a consequence of Theorem 3, we immediately obtain the following simple condition which guarantees that the distance d is triplet respecting.Proof This follows from Theorem 3 since if x, y, z ∈ X such that x, z are strictly below R and y ∈ X such that ((x, y), z) ∈ R(T ) then (d T (x, z) − d T (x, y))/2 is equal to the length of the path in T between lca(R) and lca T (x, y) and this path must contain at least one edge in T which does not contain a leaf.
Note that the converse of Corollary 4 does not hold.For example, consider the convergence scenario depicted in Fig. 4(10) where Then the bound on given in Theorem 3 is 1 (it is given by the three elements x, y, z), and for = 1 2 , d is a triplet respecting distance.So if the converse of Corollary 4 held, then for δ = 1 1000 we would have 1 2 = < δ = 1 1000 which is impossible.

Triplet respecting metrics
In this section, in case a convergence scenario gives rise to a triplet respecting distance d , we want to characterise under which circumstances d is a metric, a tree metric or an ultrametric.Note that these are proper subclasses since there are examples of triplet respecting distances d where: (1) d is a distance but not a metric (in Fig. 4 , and = 5), (2) d is a metric but not a tree metric (in Fig. 5 , and , δ > 0 are both small), and (3) d is a tree metric but not an ultrametric (in Fig. 4 and , δ > 0 are both small).
We start by characterising when a triplet respecting distance is a metric.
Theorem 5 Suppose that (T , R, ) is a convergence scenario such that d is a triplet respecting distance.Then d is a metric if and only if for all distinct x, z ∈ X strictly below R and y ∈ X such that ((x, y), z) ∈ R(T ) and lca T (x, y) is not below a bottom point of R, (⇐) Suppose Y = {x, y, z} is a triple of elements in X .To show that d is a metric, we need to show that (A)-(C) hold.If d ( p, q) = d T ( p, q) for all p, q ∈ Y , then these all hold since d T restricted to Y is a metric.So, suppose this is not the case.
To check that (A)-(C) hold, without loss of generality, by Lemma 2 we may assume that the convergence scenario (T , R, ) restricts to Y to give a convergence scenario (Q, R * = {r , r * , s, s * }, ) as in one of Fig. 3i-iv and that d . We now check that (A)-(C) hold in each of the Cases (i)-(iv).First, note that since d is triplet respecting, we have and so (A) must hold for all Cases (i)-(iv).Moreover, in Cases (i) and (iv) d (x, z) = d (y, z), and so in these cases (B) and (C) must always hold too.And, in Cases (ii) and (iii) d (x, z) ≤ d (y, z), and so in these cases (B) holds.Hence, it suffices to show that (C) holds for Cases (ii) and (iii).
Let γ be the height of r * and s * in the convergence scenario (Q, R * = {r , r * , s, s * }, ).Note that γ ≥ α.In Case (ii), (C) holds if and only if But, since γ ≥ α, this last inequality holds by Inequality (4).
In Case (iii), (C) holds if and only if Again, since γ ≥ α, this last inequality holds by Inequality (4).
123 (⇒) Suppose that d is a metric so that, in particular, (C) holds for every x, y, z ∈ X distinct.Now, suppose x, z ∈ X are distinct and strictly below R and y ∈ X is such that ((x, y), z) ∈ R(T ) and lca T (x, y) is not below a bottom point of R. Then it follows that either Case (ii) or (iii) must hold in Fig. 3 and the height of r * and s * in the convergence scenario (Q, R * = {r , r * , s, s * }, ) given by restricting to Y = {x, y, z} must be equal to α.But, as shown above, (C) holds in Case (ii) if and only if Inequality (5) holds, from which Inequality (4) follows, and (C) holds in Case (iii) if and only if Inequality (6) holds, from which Inequality (4) again follows.
Note that since the right hand side of the inequality in the statement of Theorem 5 is always greater than 1, we have the following simple condition for ensuring that a triplet respecting distance is a metric.
Corollary 6 Suppose d is a triplet respecting distance.If ≤ 1, then d is a metric.
We now conclude this section by presenting a characterisation for when d is a tree metric or an ultrametric in case it is a triplet respecting metric.We first state a useful lemma.
Lemma 7 Suppose that (T = (T , w), R, ) is a convergence scenario such that d is a triplet respecting metric, and T is one of the trees in Figs. 4 or 5 with leaf set {x, y, z, t}.
Proof Note that since d is a metric we only need to check that the 4-point condition holds for x, y, z, t all pairwise distinct (i.e.we do not need to consider subsets of {x, y, z, t}).This is a straight-forward check using the fact that d T satisfies the 4point condition and Theorem 3.For example, the distance d arising from tree T as in Fig. 5 from which it follows that d does not satisfy the 4-point condition.However, in Fig. 5 configuration (1), we have

Moreover, d is an ultrametric if and only if (a) holds. In particular, if d is a triplet respecting tree metric (or ultrametric), then d = d (T ,w )
where w is some (necessarily unique) edge weighting of T .
Proof We begin by considering the first statement of the theorem.If |X | = 3 then the statement holds since any metric on a set of size 3 is a tree metric, and precisely one of the cases (a)-(c) can apply since they detail all possible convergence scenarios on X .So assume for the remainder of the proof that |X | ≥ 4. Now, suppose d is a tree metric on X .Put R = {r , r , s, s } and ρ = ρ T .To see that one of (a)-(c) holds, we perform a case analysis in which we show that one of (a)-(c) must hold or that we can find a subset Y ⊆ X of X of size 4 so that, in view of Lemma 7, d | Y is not a tree metric which is impossible.
Suppose first that neither of the top points in R is contained in an edge of T that contains ρ.We claim that (a) must hold.To see why this is the case, let e be an edge of T that contains r .Then lca(r , s) = ρ since otherwise we can find leaves x, y, z, t ∈ X such that the convergence scenario obtained by restricting T to Y = {x, y, z, t} would be as in Fig. 5(6), which is impossible by Lemma 7.Moreover, r and r must both be contained in e since otherwise we could choose elements x, y, z, t ∈ X such that T restricted to Y = {x, y, z, t} would be as in Fig. 4(10) which is impossible.Similarly, as r , r are both in e, s and s must both be contained in the same edge e of T since otherwise we could obtain a contradiction using Fig. 4(10) again by reversing the roles of r , r and s, s .And, finally, e ∩ e = ∅ since otherwise (reversing the roles of r , r and s, s if necessary), we could choose elements x, y, z, t ∈ X such that T restricted to Y = {x, y, z, t} would be as in Fig. 4(11) which is impossible.So (a) must hold as claimed.
Now, assume r is in edge e of T with ρ ∈ e so that, in particular, lca(R) = ρ.Then either r is contained in e or r is contained in an edge e with |e ∩ e| = 1.Indeed, if this were not the case, then there would exist at least two vertices in V (T ) that are contained in the path in T between r and r .So, we could find leaves x, y, z, t ∈ X such that the convergence scenario obtained by restricting T to Y = {x, y, z, t} would be as in Fig. 4(3) which is impossible.
Let e be the edge in T that contains s.We first consider the case that r is contained in e.Note that in the case ρ ∈ e .Indeed, if not, then (b) must hold since otherwise we could find leaves x, y, z, t ∈ X such that the convergence scenario obtained by restricting T to Y = {x, y, z, t} would be as in Fig. 4(5) or ( 6) with the roles of r , r and s, s , reversed, which is impossible.Note also that s must either be in e or in an edge e of T with e ∩ e = ∅, otherwise we could obtain a configuration as in Fig. 4(3) with the roles of r , r and s, s reversed.But then (a) holds if s is contained in e and otherwise (b) holds.
Finally, suppose r is not contained in e.Then ρ ∈ e since otherwise we can find x, y, z, t ∈ X and use Fig. 5(5) to obtain a contradiction.Moreover, s is contained in e otherwise we could find x, y, z, t ∈ X and use Fig.  4),( 8), ( 9), (12) or 5(1),( 3),(7).In either of these cases d | Y must be a tree metric by Lemma 7.
Similarly, if (c) holds then since d | Y = d T | Y there must be an element in Y that is below s .Also, there must be (i) an element in Y that is below r but not below r or (ii) an element in Y that is below r .If both (i) and (ii) hold, it follows that we can assume that Y = {x, y, z, t} and that T restricted to Y must be as in Figs. 5(2) or 4(2),( 7).In either of these cases d | Y must be a tree metric by Lemma 7. If only one of (i) or (ii) holds, then a similar argument can be used where we may need to restrict to four elements in X to obtain a convergence scenario in a similar way to that used in Lemma 2 before applying Lemma 7.
We now consider the second statement in the theorem.First, suppose that (a) holds so that there exist edges e, e in T such that r , r are points in e, s, s are points in e and |e ∩ e | = 1.To see that d is an ultrametric, we need to show that Inequality (2) holds for all x, y, z ∈ X distinct, i.e. that two of the values d (x, y), d (x, z) and d (y, z) are equal and not less than the third.It clearly suffices to show that this is the case for x ∈ X below r in T , y ∈ X below s in T , and z ∈ X .
If ((x, y), z) is a triplet in R(T ), then it easily seen that d (x, y) < d (x, z) = d (y, z).Otherwise, we can assume without loss of generality that ((x, z) Conversely, assume that d is an ultrametric.Then d is a tree metric.Hence, precisely one of (a)-(c) in Theorem 8 must hold.We now show that neither (b) nor (c) can hold, which will complete the proof of the theorem.
So, assume for contradiction that either (b) or (c) holds.If (b) holds, then pick x ∈ X below r in T , z ∈ X below s in T and y ∈ X below the vertex in e ∩ e but not below r in T .And, if (c) holds, then pick x ∈ X below r in T , z ∈ X below s in T and y ∈ X below r but not below r in T .In either case, since d is a triplet respecting ultrametric we must have d (y, z) = d (x, z) (as the two largest values of d | {x,y,z} must be equal).But this is clearly impossible since, by the definition of d , we must have d (y, z) > d (x, z) in both cases (b) and (c).
The last statement of the theorem holds in view of (Semple and Steel, 2003, Theorems 7.1.8 and 7.2.5).

Recovering the height of the tree within a convergence scenario
Consider the two convergence scenarios given in Fig. 4 2 and β = 3 1 2 and in Fig. 4(8) with h(ρ T ) = 4 1 2 , h(lca(x, y)) = 3, h(lca(x, z)) = 1, α = 1 1 2 and β = 2. Then it can be checked that, for = 1, both scenarios give rise to the same map d which, in view of Lemma 1 is a distance, in view of Theorem 3 is triplet respecting and, in view of Theorem 5 is a metric even though the height of T in the two scenarios is different.In particular, for this example, even though we can recover the topology of T from d since it is a triplet respecting metric, we are not able to identify the height of T from d in the sense that there are different choices of R (but with the same and T ) which induce the same d .Motivated by this example, in this section for a convergence scenario (T , R, ) that gives rise to a triplet respecting metric, we shall characterise which choices of R ensure that we are able to recover the height of T from d (see Theorem 11 below).
To make this more precise, for a convergence scenario (T = (T , w), R, ) we denote d also by d (w,R) to emphasise the choice of w and R. In addition, for some choice of w and R, we shall say that the height of (T , w) is identifiable from d (w,R) if there does not exist a choice of w and R with We now give key examples of some choices of R in a convergence scenario where it is not possible to identify the height of the underlying tree.From now on, given a convergence scenario (T = (T , w), R, ), we put h w (a) = h (T ,w) (a) for any point a in T if T is clear from the context.Note that in the following lemma we only require that d is a triplet respecting distance.
Lemma 9 Suppose that (T , R, ) is a convergence scenario such that d is a triplet respecting distance.If R is one of the configurations in Fig. 7, then the height of T is not identifiable from d .
Proof Put T = (T , w) and ρ = ρ T .For R as in Fig. 7, let l = h w (r ) − h w (r ) = h w (s) − h w (s ).We now consider each of the configurations (A)-(C) in Fig. 7.
First, suppose R = {r , r , s, s } is as in Fig. 7A.Let v = ρ denote the vertex in this configuration that is contained in the edges e, e which contain r , r , s, s .Note that for any leaf x below e and any leaf y below e , we have We first claim that min{w(e), w(e )} > l.Indeed, suppose that there are precisely two leaves x and y below v. Then x and y form a cherry of T .Hence, w(e) = w(e ) because (T , w) is equidistant.Since d (w,R) is a distance and so w(e) > l.If there are at least three leaves in T below v, then we may assume without loss of generality that w(e) = min{w(e), w(e )}, so that in particular there must be at least two leaves below r .Let x and y be two leaves below r such that lca T (x, y) is a child of v, and let z be a leaf below s .Then R) is triplet respecting but not a cherry scenario it follows by Theorem 3 that w(e) > l.So the claim follows.Now, consider the edge-weighting w of T that is obtained as follows.Add δ, some δ > 0 small, to the weights of the two edges containing the root of T , subtract l from the weights of the edges e and e (which is possible since min{w(e), w(e )} > l), add l to the weight of the edge containing v and not equal to e, e (which exists as v = ρ), and keep all other edge weights the same.In addition, place p, p into one edge containing ρ and q, q into the other edge containing the root of T so that h w ( p) − h w ( p ) = h w (q) − h w (q ) = δ, which is possible by taking δ to be sufficiently small.Then for R = {p, p , q, q }, it is straight-forward to check that (w, R) Now, suppose R = {r , r , s, s } is as in Fig. 7B where r and s are as indicated in its caption.Consider the edge-weighting w of the tree T obtained by replacing the weights of the edges in T containing ρ with the same weight plus δ for some small δ > 0, and keeping all other edges the same weight.Relative to the weighting w , place p, q in the same edges of T as r , s, respectively, at height h w (r ) + δ, h w (s) + δ, respectively (which is possible since we can choose δ to be sufficiently small), and place p , q in the same edges as r , s with heights (relative to w ) h w (r ) and h w (s ), respectively.Then for R = {p, p , q, q }, it is straight-forward to check that Finally, suppose R = {r , r , s, s } is as in Fig. 7C.Consider the weighting w of T that is obtained by replacing the weights of the edges in T containing ρ with the same weight plus δ, some δ > 0 small, and keeping all other edges the same weight.Let u be the vertex in T adjacent to ρ and above r , r .Relative to the weighting w , place q at height h w (u) + δ and p at the same height in the other edge that contains the root of T (which is possible by taking δ sufficiently small).Also, place q in the same edge as r at height h w (u) − l (which is possible since the edge containing r , r in T with weight w has length greater than l), and p at the same height in the same edge that contains p. Then for R = {p, p , q, q }, it is straight-forward to check that (w, R) = (w , R ), We now prove a technical lemma that we will use to prove the main result of this section.Note that this result does not depend on edge-weights.
Lemma 10 Suppose that (T = (T , w), R, ) is a convergence scenario.Then R is not as in one of the configurations pictured in Fig. 7B

or C if and only if precisely one of the Conditions (a)-(e) below holds:
(a) There are x, y, z, t ∈ X such that lca T (x, t) = ρ T , u = lca T (x, y) and v = lca T (z, t) are the children of ρ T , one top point in R is in the edge {ρ T , v}, one bottom point of R lies on the path between v and t and two points in R lie on the path between u and x.(b) There are x, y, z, t ∈ X such that lca T (x, t) = ρ T , two points in R lie on the path between ρ T and x, u = lca T (y, t) is a child of ρ T , v = lca T (z, t) is a child of u, and two points in R lie on the path between v and t.(c) There are x, y, z, t ∈ X such that lca T (x, t) = ρ T , two points in R lie on the path between ρ T and x, u = lca T (y, t) is a child of ρ T , v = lca T (z, t) is a child of u, one top point in R is in the edge {u, v}, and one bottom point in R lies on the path between v and t.(d) There are x, y, z, t ∈ X such that u = lca T (x, y) and v = lca T (z, t) are the children of ρ T , two points in R lie on the path in T between u and x, and two points in R lie on the path in T between v and z.(e) All of the points in R are contained in one of the subtrees of T whose root is a child of ρ T .
Proof It is straight-forward to see that if any of Conditions (a)-(e) holds then R is not as in Fig. 7B or C. Conversely, suppose that R is not as in Fig. 7B or C. First note that we may assume that lca(R) = ρ T , otherwise (e) holds.The convergence scenarios considered in Theorem 11.The edge-weights assigned by w to T are arbitrary, but chosen so that d (w,R) is a triplet respecting metric.In B, r and s can be in any edge of T below r and s, respectively.Note that the roles of r , r and s, s are interchangeable Suppose first that one of the top points in R, say r , is in an edge of T that contains the root ρ T of T .By Fig. 7B, the top point s is not contained in the other edge incident with ρ T .Now, if the point r ∈ R below r is not contained in the same edge as r , then (a) holds.On the other hand, if r is contained in the same edge as r , then as Fig. 7C cannot hold it follows that (c) holds in case s is in an edge incident with a child of ρ T , and that (b) holds otherwise.Now, suppose that r is in an edge of T that does not contain the root of T and s ∈ R is the other top point.By the preceding paragraph with the role of s and r interchanged, we can assume that s is also in an edge of T that does not contain the root of T .But then (d) holds.
We now prove the main result of this section.
Theorem 11 Suppose that (T = (T , w), R, ) is a convergence scenario such that d = d (T ,R) is a triplet respecting metric.Then the height of T is identifiable from d if and only if R is not as one of the configurations in Fig. 7.Moreover, if this is the case, then h T (ρ T ) = 1 2 max x,y∈X {d (x, y)}.
Proof The 'if' direction follows immediately by Lemma 9.
For the 'only if direction', let h = h w , h = h w and, for u a point in T or (T , w ), let h(u) = h w (u) or h (u) = h w (u), respectively.Furthermore, put ρ = ρ T , h = h (ρ) and h = h(ρ).To see that this direction holds, suppose that R = {r , r , s, s } is not as in Fig. 7A-C and assume for contradiction that the height of T in not identifiable from d .Then there exists some (w , R = {p, p , q, q }) = (w, R) with h = h.such that d (w,R) = d (w ,R ) .
First note that since R is not as in Fig. 7A-C, and, in particular, not as in Fig. 7B, C, we may assume that one of Lemma 10(a)-(e) holds and that if Lemma 10(e) holds then R is not as in Fig. 7A.Using the notation in Lemma 10(a)-(e), we now show that each of these cases leads to a contradiction, which will complete the proof of the first part of the theorem.
First note that if (a) holds, then since d (w ,R ) (y, z) = d (w,R) (y, z) = 2h, and h = h, it follows that h > h, and that p, q are contained in the two edges that contain the root of T .A similar argument can also be applied to each of the cases (b)-(e) to show that in all of these cases p, q must be in the two edges that contain the root of T , so we shall assume this from now on.Lemma 10(a) holds: Without loss of generality assume that r and r are on the path from u to x, s is in the edge {ρ, v}, and s is on the path from v to t in T .Note that this implies h(u) > h(v).Also, without loss of generality, we assume that p is above u and q is above v in (T , w ).
Hence, using a similar argument, we see that q must in fact lie on the path from v to t as d (w,R) (b, x) ≥ d (w,R) (b, t) for all b = t with lca T (b, z) = v.Note also that p is not above follows that p must lie on the path from u to x.
Lemma 10(b) holds: Without loss of generality assume that r , r are on the path from ρ to x and s, s are on the path from v to t.Also, without loss of generality, assume that p is on the path from ρ to x and q is in the edge {ρ, u} in (T , w ).
Note that q cannot be in the same edge {ρ, u} as q since this implies d (w ,R ) (x, y) = ) which is impossible.Thus q must be below u.But then we would have d z) which is also impossible.
Lemma 10(c) holds: Without loss of generality, assume that r , r are on the path from ρ to x, s is in the edge {u, v} and s is on the path from v to t in (T , w ).Without loss of generality, we assume that q is contained in the edge {ρ, u}.
Using similar arguments to the ones in case (a), note that lca T (q , y) , and thus q is on the path from v to t since d (w,R) Lemma 10(d) holds: Without loss of generality, assume r , r lie on the path in T between u and x, and that s, s lie on the path in T between v and z.Without loss of generality, we assume that p is contained in the edge {ρ, u}.Note that as d (w,R) is a triplet respecting metric, by Theorem 8 d (w,R) is not an ultrametric, and so p and q are not both in the same edges as p and q, respectively (otherwise, by Theorem 8, d would be an ultrametric).
Without loss of generality, suppose that p is not in the same edge as p.Then p must be on the path in (T , w ) from u to x, otherwise there would be some b ∈ X −{x} below p with lca T (b, x) = u or lca T (b, x) below u, which implies d A similar argument also implies that q must be on the path in T from q to z.Now if q is below v, then this leads immediately to a contradiction since it implies that d And if q is in the same edge {ρ, v} as q, then we again obtain a contradiction since then d Lemma 10(e) holds and R is not as in Fig. 7A: Note that as Lemma 10(e) holds, we must have d (w,R) (x, y) = 2h = d (w ,R ) (x, y) for all x, y in X with lca T (x, y) = ρ T .But then, as h = h , it is straight-forward to check that p and p must be in the same edge of T that contains the root, and that the same holds for q and q .Hence, since d is a triplet respecting metric by assumption it follows that d (w ,R ) is an ultrametric by Theorem 8.But this is impossible since R is not as in Fig. 7A and so, by Theorem 8, The last statement of the theorem holds since if R is one of the configurations in Lemma 10, then it is straight-forward to check that there must exist some a, b ∈ X such that d (w,R) (a, b) = 2h (since in all of (a)-(e) at least one of the edges in T containing the root of T does not contain a top point of R), and clearly d (w,R) (x, y) ≤ 2h for all x, y ∈ X .
Corollary 12 Suppose that (T , R, ) is a convergence scenario such that d is a triplet respecting tree metric.Then the height of T is not identifiable from d .
Proof If d is a tree metric, then, by Theorem 8, R must be as in Fig. 6.But if the height of T is identifiable from d , then, by Theorem 11, either Lemma 10(a)-(d) holds or Lemma 10(e) holds and R is not as in Fig. 7A.But it is straight-forward to check that this is impossible.

Remark 2
The situation in Theorem 11 gets more complicated if we allow the parameter to also vary.For example, consider the convergence scenario (T = (T , w), R, ) in Fig. 5(5) where the weight of the edge {ρ T , lca T (x, y)} is 2, h(lca T (x, y)) = 4, h(lca T (z, t)) = 2, β = 3, α = 3 2 and = 1 2 , and the convergence scenario (T = (T , w ), R , ) in Fig. 5(4) where the weight of the edge {ρ T , lca T (x, y)} is 2 1 4 , h(lca T (x, y)) = 4, h(lca T (z, t)) = 2, β = 5 and α = 1, where , even though the configuration of R in T is not as in Fig. 7 (since R corresponds to the configuration given in Lemma 10(a)).It could be interesting to understand when the height of T is identifiable in case the parameter is also allowed to vary.

Discussion
We have introduced a new distance-based model for convergent evolution and characterised when the model leads to a tree metric, as well as giving conditions in terms of the model's parameters for when it still possible to recover the underlying tree and its height even in case we do not obtain a tree metric.Our model is similar in nature to the convergence-divergence models presented in Mitchell et al. (2018) in which a probabilistic approach is developed based on a Markov model of character evolution.In our distance-based approach convergence is acting in a linear way, whereas in the character-based approach two sequences that are converging converge faster when they are further apart and more slowly as they get closer since there are fewer mismatch sites to "correct".
In Mitchell et al. (2018), the authors mainly focus on phylogenetic trees that have three or four leaves, where they also find cases in which convergence gives rise to tree metrics (e.g. in (Mitchell et al., 2018, Fig. 6) they give an example similar to our Fig.4(8)).Since our model can be applied to a set of species of arbitrary size it could be interesting to understand if there are deeper connections between the two approaches that could be exploited to give further insights into convergent evolution for larger data sets.A starting point might be to consider the interplay of our approach with the Jukes-Cantor model (Jukes and Cantor 1969), one of the simplest Markov models that is used to correct distance data in evolutionary studies [see e.g.(Felsenstein, 2004, Chapter 11)].
Our results suggest that under some circumstances the tree topology and some information about convergence events may be recoverable from observed distances.We give conditions for recovering the topology of the underlying tree and its overall height.In general, the starting and ending points α and β are not precisely recoverable as they only effect the distances via their difference β − α.However, if is assumed known it may be possible to determine β − α and also to determine on what edges the points r , s, r and s in the convergence set must lie.If the strength of convergence is not known then it will presumably not be possible to determine β −α as strong convergence acting for a shorter time period would appear equivalent to weaker convergence acting over a longer time period.However, in this case it may again be possible to at least localise which edges the points r , s, r and s occur on.
More generally, we have only considered the case of a single pair of convergent paths.In future work, it would be interesting to consider conditions under which multiple convergence events might be distinguishable from simple tree-like evolution.Even so, some care may need to be taken with choosing the number of parameters as there could be issues with overfitting [see (Steel 2005)], as well as our underlying assumption of clock-like evolution [see (Mitchell et al. 2018, p. 914) for related discussion].Furthermore, in practice it would be useful to develop algorithms to return a phylogenetic tree along with pairs of sets of edges for which there is evidence of convergence operating given a distance matrix as input.
While this paper shows that it will not be possible to recover all convergence events (or even one convergence event) from an observed distance in general, the results suggest intriguing possibilities for algorithms that could at least recover some partial information.This could be particularly useful in cases where convergence events have had a small enough impact so that the input metric is still triplet preserving.
One approach that we hope to explore in future work is to develop a method based upon algorithmic variants of the Build algorithm which can construct phylogenetic trees from sets of triplets [see e.g.Semple and Steel (2000)].More specifically, we would begin by computing an unweighted tree T from a collection of triplets inferred from an input distance d on a set X , after which we would set the height of any internal vertex of T to be half the maximum taken over all distances between all pairs of taxa whose least common ancestor is that vertex.The distance associated to this weighted tree, T , then forms an ultrametric, d T , that is greater than or equal to the observed distance d for any pair in X .We would then look for a convergence scenario (T , R, ) with the aim of minimising the discrepancies d (T ,R) (x, y) − d(x, y), x, y ∈ X .Note that, in general, we would expect some variation from d being an ultrametric just due to random sampling rather than convergence, so we would probably need to also define some threshold of improvement to control the addition of convergence events.Finally, as noted in the introduction, reticulate processes can also lead to a break down in the divergence model for evolution.Interestingly, in Francis and Steel (2015, Theorem 5) it is shown that in case a special type of phylogenetic network called a horizontal gene transfer network has a single cycle, then the so-called average distance (Willson 2012) that it induces satisfy the four-point condition if and only if the arcs in the network satisfy certain specific conditions.This result has similarities to what we find in Theorem 3, and points to the fact that models of reticulate evolution can also lead to tree metrics.Thus, it will be important to develop approaches that will allow us to distinguish between distances that are generated by reticulate versus convergent evolution.However this may not always be mathematically possible, in which case, as suggested in Mitchell et al. (2018), it may be useful to consider additional biological or biogeographical information to help decide which model to employ.

Fig. 1
Fig. 1An example of a convergence model on the set of species X = {a, b, c, d, e, f , x, y, z} associated to an edge-weighted rooted tree T with root ρ T .The weights of the edges are proportional to their lengths.The points r , s represent two ancestral species which have converged to give rise to the species r , s , respectively, over a period of time that is proportional to the length of the two bold paths

Corollary 4 Fig. 4
Fig.4A phylogenetic tree T , together with a table that indicates in each column which edges contain the top points r , s and bottom points r , s used to form a convergence scenario.For example, column 10 indicates that top point r is contained in edge e 3 , bottom point r is in edge e 1 and points s, s are both contained in edge e 4 .The last row indicates whether the configuration gives rise to a distance satisfying the four-point condition or not (seeLemma 7)

)
Proof First note that since d is a distance, d is a metric if and only if the triangle inequality holds for every triple Y = {x, y, z} of distinct elements x, y, z ∈ X , i.e. (A) d (x, y) ≤ d (x, z) + d (z, y), (B) d (x, z) ≤ d (x, y) + d (y, z), and (C) d (y, z) ≤ d (y, x) + d (x, z) all hold.

Fig. 5 A
Fig.5A phylogenetic tree T , together with a table that indicates in each column which edges contain the top points r , s and bottom points r , s used to form a convergence scenario as in Fig.4.The last row indicates whether the configuration gives rise to a distance satisfying the four-point condition or not (seeLemma 7)

Fig. 6
Fig. 6 Possible placements of r , r , s, s for the proof of Theorem 8

Theorem 8
d T satisfies the 4-point condition) and by Theorem 3 d T (x, z) − d T (x, y) > 2 (β − α) and d T (y, t) − d T (z, t) > 2 (β − α) both hold, from which it follows that d (x, y) + d (z, t) < d (x, z) + d (y, t) = d (x, t) + d (y, z).Suppose that (T = (T , w), R, ) is a convergence scenario such that d is a triplet respecting metric.Then d is a tree metric if and only if precisely one of the following holds (see Fig. 6): (a) There exist edges e, e in T such that |e ∩ e | = 1 and |R ∩ e| = |R ∩ e | = 2. (b) There exist edges e, e , e in T such that e and e both contain the root of T , |e ∩ e | = 1, |e ∩ e | = 0, and |R ∩ e | = |R ∩ e | = 2. (c) There exist edges e, e , e in T such that e and e both contain the root of T , |e ∩ e | = 1, |R ∩ e| = |R ∩ e | = 1 and |R ∩ e | = 2.
5(4) to obtain a contradiction.Thus (c) must hold.Conversely, suppose precisely one of (a)-(c) holds.We will show that d is a tree metric.Take any Y ⊆ X with |Y | = 4.It suffices to show that d | Y is a tree metric.If d | Y = d T | Y then this is clearly the case.So we may assume d | Y = d T | Y .If one of (a) or (b) holds, then since d | Y = d T | Y there must be some pair of elements in Y that is strictly below R. It follows that we can assume that Y = {x, y, z, t} and that T restricted to Y must be as in Figs.4(1),( Fig.7The convergence scenarios considered in Theorem 11.The edge-weights assigned by w to T are arbitrary, but chosen so that d(w,R) is a triplet respecting metric.In B, r and s can be in any edge of T