A map \(d:X\times X\rightarrow {\mathbb {R}}_0^+\) is a metric if it satisfies, for all \(x,y,z\in X\):
-
(M0)
\(d(x,x)=0\)
-
(M1)
If \(d(x,y)=0\) then \(x=y\).
-
(M2)
\(d(x,y)=d(y,x)\).
-
(M3)
\(d(x,y)+d(y,z)\ge d(x,z)\).
Distance measures can be used for clustering and thus serve as a means of extracting hierarchical, i.e., tree-like, structures on a set of data.
The basis of distance-based phylogenetic methods is additive metrics, i.e., metrics that are representations of edge-weighted trees. Consider a tree \(T\) with leaf-set \(X\) and a length function \(\ell\) defined on the edges of \(T\). Recall that every pair of leaves \(x\) and \(y\) is connected by a unique path \(\mathbf{p }_{xy}\) in \(T\). The length of this path, i.e., the sum of its edge lengths, defines the distance \(d_T(x,y)\). Additive metrics are those that derive from a tree in this manner. A famous theorem (Buneman 1974; Cunningham 1978; Dobson 1974; Simões-Pereira 1969) shows that additive metrics are characterized by the four-point condition: A metric is additive if and only if for any four points \(u,v,x,y\in X\) holds
-
(MA)
\(d(u,v)+d(x,y) \le \max {\left\{ \begin{array}{ll} d(u,x)+d(v,y) \\ d(u,y)+d(v,x) \end{array}\right. }.\)
The appearance of additive metrics in evolutionary processes can be justified rigorously for specific models. For example, Markovian processes on strings of fixed length lead to distances that can be estimated directly from the data: Denoting by \(c_{ab}(x,y)\) the fraction of characters in which \(x\) has state \(a\) and \(y\) has state \(b\), which for each pair (\(x,y\)) can be arranged in a matrix \({\mathbf {C}}(x,y) = \big (c_{a,b}(x,y)\big )_{a,b}\). Steel (1994) showed that (the expected values of) \(d(x,y):= -\ln \vert \mathrm {det}({\mathbf {C}}(x,y)\vert\) form an additive metric. Well-known results from phylogenetic combinatorics show that given an additive metric, the tree \(T\) and its edge lengths can be reconstructed readily, see, e.g., the work of Apresjan (1966), Imrich and Stockiĭ (1972), Buneman (1974), Dress (1984), Bandelt and Dress (1992), Dress et al. (2010a). The well-known neighbor-joining algorithm (Saitou and Nei 1987), a special case of a large class of agglomerative clustering algorithms, furthermore, solves this problem efficiently and was shown to always compute the correct tree when presented with an additive metric, see the survey by Gascuel and Steel (2006) and the references therein. Additivity of the underlying metric is also assumed in a recent generalization of phylogenetic trees that allows data points to appear not only as leaves but also as interior vertices of the reconstructed tree (Telles et al. 2013).
A stronger condition than additivity is ultrametricity, which is characterized by the strong triangle equation
-
(MU)
\(d(x,z)\le \max \{d(x,y),d(y,z)\}.\)
Condition (MU) means that all triangles are “isosceles with a short base”, i.e., the length of two sides of the triangles is equal and the third one is at least not longer than these two. Ultrametrics appear in phylogenetics under the assumption of the strong clock hypothesis, i.e., constant evolutionary rates (Dress et al. 2007). Dating of the internal nodes (Britton et al. 2007) transforms an (additive) phylogeny into an ultrametric tree. Ultrametrics are a special case of additive metrics.
Real-life data sets, unfortunately, almost never satisfy the four-point condition. As a remedy, Sattah and Tversky (1977) and Fitch (1981) suggested to consider a “split relation” on pairs of objects, often referred to as quadruples, defined by
$$\begin{aligned} uv \Vert xy \,\iff \, d(u,v)+d(x,y) < {\left\{ \begin{array}{ll} d(u,x)+d(v,y) \\ d(u,y)+d(v,x) \end{array}\right. } \end{aligned}.$$
(1)
The relation \(\Vert\) has been studied extensively and, under certain additional conditions, can provide sufficient information for reconstructing phylogenetic trees (Bandelt and Dress 1986) or at least phylogenetic networks (Bandelt and Dress 1992; Grünewald et al. 2009). The approximation of a given metric by additive metrics or ultrametrics given some measure of the goodness of fit has also received quite a bit of attention (Farach et al. 1996; Agarwala et al. 1998; Apostolico et al. 2013).
Here, we ask under which conditions distance data that may deviate from additivity in a systematic manner still yield a phylogenetically (more or less) correct relation \(\Vert\). This is different from the inference problems mentioned above: Our task is not to minimize a uniform error functional but to deal with systematic distortions of the distance measurements. In order to formalize the problem setting, we assume that the evolutionary process under consideration (operating on a space \(X\)) generates an additive metric \(t: X\times X\rightarrow {\mathbb {R}}_0^+\). The catch is that we have no knowledge of \(X\) and we cannot directly access \(t\). We can, however, obtain partial knowledge from representations. That is, there is a function \(\varphi :X\rightarrow Y\). The construction of the representation in \(Y\) depends on our theory of what is important about the evolving system. In molecular phylogenetics, \(Y\) may be chosen to be a space of sequences. In classical, morphology-based phylogenetics, the elements of \(Y\) are character-based descriptions of animals; attempts to use molecular structures for phylogenetic purposes might use RNA secondary structures or labeled graph representations of protein 3D structures; a historic linguist might choose word lists or grammatical features.
Once we have decided on representations, we can turn to measuring (dis)similarities between them. The concrete choice of a distance measure \({{\tilde{d}}}:Y\times Y\rightarrow {\mathbb {R}}_0^+\) of course again depends on the theoretical conception of the underlying evolutionary process. We can easily reinterpret \({\tilde{d}}\) as a distance measure on \(X\) by setting
$$\begin{aligned} d(x,y) := {{\tilde{d}}}(\varphi (x),\varphi (y)) \end{aligned}.$$
(2)
It is easy to see that \(d:X\times X\rightarrow {\mathbb {R}}\) is a metric whenever \({\tilde{d}}\) is a metric and \(\varphi :X\rightarrow Y\) is injective, i.e., whenever our representation is good enough to distinguish objects in \(X\). There is no a priori reason to make this assumption, however. Consider, for example, RNA secondary structures as a function of the primary sequences. This map is highly redundant (Schuster et al. 1994); for example, most tRNAs share the standard clover-leaf structure despite very different sequences and divergence times that pre-date the common ancestor of all extant life forms (Eigen et al. 1989); distances between secondary structures therefore do not reflect all evolutionary processes. Formally, \(d\) is not a metric but only a pseudometric in this case: It does not satisfy axiom (M1) any longer. We will ignore this complication here and assume for simplicity that \(d:X\times X\rightarrow R_0^+\) is a metric.
The metric \(d\) is of interest for phylogenetic purposes if it quantifies evolutionary divergence in a meaningful way. That is, we are concerned with the information about the underlying additive metric \(t\) that can be extracted from \(d\). Without additional assumptions on the relationships between \(t\) and \(d\), however, nothing much can be said. At the very least, our representation \((Y,{\tilde{d}})\) should be good enough to recognize whether one of two objects \(y\) or \(z\) has diverged further from a given reference point \(x\) than the other. Hence, we assume that for all \(x,y,z\in X\):
-
(m0)
\(t(x,y)<t(x,z)\) implies \(d(x,y)<d(x,z)\).
In the absence of at least this very weak form of monotonicity, we cannot really hope to recover information about \(t\) from measuring \(d\). To our knowledge, property (m0) has not received much attention in the past. The following, stronger condition, however, has been considered extensively:
-
(m1)
\(t(x,y)<t(u,v)\) implies \(d(x,y)<d(u,v)\)
for all \(u,v,x,y\in X\). This property is known as (strong) monotonicity (Kruskal 1964) and lies at the heart of non-metric multi-dimensional scaling, a set of techniques that aim at approximating dissimilarity data by a Euclidean metric (Borg and Groenen 2005). A commonly used criterion is to minimize the violations of condition (m1). It is interesting to note in this context that, given any input metric \(d\), there is a always a Euclidean metric \(\delta\) that is connected with \(d\) by strong monotonicity, provided the embedding space is of sufficiently high dimension (Agarwal et al. 2007). In our context, it will be interesting to investigate whether there is an analogous result for additive metrics.
If we insist, in addition, that ties are preserved, i.e., that \(t(x,y)=t(u,v)\) is equivalent to \(d(x,y)=d(u,v)\), then there exists an increasing function \(\zeta :{\mathbb {R}}_0^+\rightarrow {\mathbb {R}}_0^+\) such that \(d=\zeta (t)\). In the following, we will consider this (more restrictive) setting in some detail.