New Gromov-Inspired Metrics on Phylogenetic Tree Space

Liebscher, Volkmar

doi:10.1007/s11538-017-0385-z

New Gromov-Inspired Metrics on Phylogenetic Tree Space

Original Article
Published: 02 January 2018

Volume 80, pages 493–518, (2018)
Cite this article

Bulletin of Mathematical Biology Aims and scope Submit manuscript

Volkmar Liebscher ORCID: orcid.org/0000-0003-1446-4423¹

588 Accesses
3 Citations
4 Altmetric
Explore all metrics

Abstract

We present a new class of metrics for unrooted phylogenetic X-trees inspired by the Gromov–Hausdorff distance for (compact) metric spaces. These metrics can be efficiently computed by linear or quadratic programming. They are robust under NNI operations, too. The local behaviour of the metrics shows that they are different from any previously introduced metrics. The performance of the metrics is briefly analysed on random weighted and unweighted trees as well as random caterpillars.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A partial order and cluster-similarity metric on rooted phylogenetic trees

Article 17 February 2020

Two results about the Sackin and Colless indices for phylogenetic trees and their shapes

Article 23 November 2022

Wald Space for Phylogenetic Trees

References

Agarwal PK, Fox K, Nath A, Sidiropoulos A, Wang Y (2015) Computing the Gromov–Hausdorff distance for metric trees. In: Elbassioni K, Makino K (eds) Algorithms and computation. Lecture Notes in Computer Science, vol 9472, pp 529–540. Springer, Berlin. arXiv:1509.05751
Allen BL, Steel M (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Ann Comb 5:1–15
Article MathSciNet MATH Google Scholar
Benner P, Bačak M, Bourguignon P-Y (2014) Point estimates in phylogenetic reconstructions. Bioinformatics 30:i534–i540
Article Google Scholar
Berkelaar M et al (2015) lpSolve: Interface to “Lp_solve” v. 5.5 to solve linear/integer programs. R package version 5.6.13. https://CRAN.R-project.org/package=lpSolve
Bernstein DI (2017) L-infinity optimization to Bergman fans of matroids with an application to phylogenetics. arXiv:1702.05141
Bernstein DI, Long C (2017) L-infinity optimization to linear spaces and phylogenetic trees. arXiv:1702.05127
Billera LJ, Holmes SP, Vogtmann K (2001) Geometry of the space of phylogenetic trees. Adv Appl Math 27(4):733–767
Article MathSciNet MATH Google Scholar
Bogdanowicz D, Giaro K (2012) Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM Trans Comput Biol Bioinform 9(1):150–160
Article Google Scholar
Bonet ML, St. John K (2010) On the complexity of uSPR distance. IEEE/ACM Trans Comput Biol Bioinform 7(3):572–576
Article Google Scholar
Bourque M (1978) Arbres de Steiner et reseaux dont certains sommets sont a localisation variable. PhD thesis, Montreal
Brodal GS, Fagerberg R, Pedersen CNS (2001) Computing the quartet distance between evolutionary trees on time ${\rm O}(n\log ^2n)$. In: Proceedings of the 12th international symposium on algorithms and computation (ISAAC). Lecture Notes in Computer Science, vol 2223, pp 731–737. Springer
Buneman P (1971) The recovery of trees from measures of dissimilarity. In: Kendall DG, Tautu P (eds) Mathematics in the archeological and historical sciences. Edinburgh University Press, Edinburgh, pp 387–395
Google Scholar
Buneman P (1974) A note on the metric properties of trees. J Comb Theory 17(1):48–50
Article MathSciNet MATH Google Scholar
Burago D, Burago Y, Ivanov S (2001) A course in metric geometry. Graduate studies in mathematics, vol 33. American Mathematical Society, Providence
MATH Google Scholar
Chakerian J, Holmes S (2017) Distory: distance between phylogenetic histories. R package version 1.4.3. http://CRAN.R-project.org/package=distory
Coons JI, Rusinko J (2016) A note on the path interval distance. J Theor Biol 398:145–149
Article MathSciNet MATH Google Scholar
Cristina J (2008) Gromov–Hausdorff convergence of metric spaces, Helsinki. http://www.helsinki.fi/~cristina/pdfs/gromovHausdorff.pdf. Accessed 2 Feb 2015
DasGupta B, He X, Jiang T, Li M, Tromp J, Zhang L (1997) On distances between phylogenetic trees. In: Proceedings of the eighth ACM/SIAM symposium discrete algorithms (SODA ’97), pp 427–436
Day WHE (1985) Optimal algorithms for comparing trees with labeled leaves. J Classif 2(1):7–28
Article MathSciNet MATH Google Scholar
Dress A (1984) Trees, tight extensions of metric spaces, and the cohomological dimension of certain groups: a note on combinatorial properties of metric spaces. Adv Math 53(3):321–402
Article MathSciNet MATH Google Scholar
Dress A, Holland B, Huber KT, Koolen J, Moulton V, Weyer-Menkoff J (2005) $\Delta $-additive and $\Delta $-ultra-additive maps, Gromov’s trees and the Farris transform. Discrete Appl Math 146:51–73
Article MathSciNet MATH Google Scholar
Edwards DA (1975) The structure of superspace. In: Stavrakas NM, Allen KR (eds) Studies in topology. Academic Press, New York, pp 121–133
Chapter Google Scholar
Estabrook GF, McMorris FR, Meacham CA (1985) Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst Zool 34(2):193–200
Article Google Scholar
Fischer M, Kelk S (2016) On the maximum parsimony distance between phylogenetic trees. Ann Comb 20(1):87–113
Article MathSciNet MATH Google Scholar
Gavryushkin A, Drummond A (2016) The space of ultrametric phylogenetic trees. J Theor Biol 403:197–208
Article MathSciNet MATH Google Scholar
Gromov M (1981) Groups of polynomial growth and expanding maps. Publ Math IHÉS 53:53–73
Article MathSciNet MATH Google Scholar
Guénoche A, Leclerc B, Makarenkov V (2004) On the extension of a partial metric to a tree metric. Discrete Math 276:229–248
Article MathSciNet MATH Google Scholar
Hoffman AJ, Kruskal J (2010) Introduction to integral boundary points of convex polyhedra. In: Jünger M et al (eds) 50 years of integer programming, 1958–2008. Springer, Berlin, pp 49–50
Google Scholar
Huggins P, Owen M, Yoshida R (2012) First steps toward the geometry of cophylogeny. In: Hibi T (ed) Harmony of Gröbner bases and the modern industrial society. World Scientific, Singapore, pp 99–116
Chapter Google Scholar
Isbell JR (1964) Six theorems about injective metric spaces. Commun Math Helv 39(1):65–76
Article MathSciNet MATH Google Scholar
Karmarkar N (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4(4):373–395
Article MathSciNet MATH Google Scholar
Kelk S, Fischer M (2017) On the complexity of computing MP distance between binary phylogenetic trees. Ann Comb 21(4):573–604
Article MathSciNet MATH Google Scholar
Kendall M, Colijn C (2016) Mapping phylogenetic trees to reveal distinct patterns of evolution. Mol Biol Evol 33(10):2735–2743
Article Google Scholar
Lang U, Pavón M, Züst R (2013) Metric stability of trees and tight spans. Arch Math 101(1):91–100
Article MathSciNet MATH Google Scholar
Liebscher V (2015) gromovlab: Gromov–Hausdorff type distances for labeled metric spaces. R package version 0.7-6. http://CRAN.R-project.org/package=gromovlab
Lin Y, Rajan V, Moret BME (2012) A metric for phylogenetic trees based on matching. IEEE/ACM Trans Comput Biol Bioinform 9(4):1014–1022
Article Google Scholar
Lin B, Sturmfels B, Tang X, Yoshida R (2017) Convexity in tree spaces. SIAM J Discrete Math 31(3):2015–2038
Article MathSciNet MATH Google Scholar
Mémoli F (2007) On the use of Gromov–Hausdorff distances for shape comparison. In: Symposium on point based graphics, Prague, Sept 2007
Moulton V, Wu T (2015) A parsimony-based metric for phylogenetic trees. Adv Appl Math 66:22–45
Article MathSciNet MATH Google Scholar
Nye TMW (2011) Principal components analysis in the space of phylogenetic trees. Ann Stat 39(5):2716–2739
Article MathSciNet MATH Google Scholar
Owen M, Provan J (2011) A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans Comput Biol Bioinform 8(1):2–13
Article Google Scholar
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20(2):289–290
Article Google Scholar
Pardalos PM, Wolkowicz H (eds) (1994) Quadratic assignment and related problems. DIMACS series in discrete mathematics and theoretical computer science, vol 16. AMS, Providence, RI. Papers from the workshop held at Rutgers University, New Brunswick, New Jersey, May 20–21, 1993
Pattengale ND, Gottlieb EJ, Moret BM (2007) Efficiently computing the Robinson–Foulds metric. J Comput Biol 14(6):724–735
Article MathSciNet Google Scholar
Penny D, Hendy MD (1985) The use of tree comparison metrics. Syst Biol 34(1):75–82
Article Google Scholar
R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, version 3.4.3, Vienna, Austria. http://www.R-project.org/
Robinson DF (1971) Comparison of labeled trees with valency three. J Comb Theory 11:105–119
Article MathSciNet Google Scholar
Robinson DF, Foulds LR (1979) Comparison of weighted labelled trees. In: Combinatorial mathematics VI. Lecture Notes in Mathematics, vol 748, pp 119–126. Springer, Berlin
Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53:131–147
Article MathSciNet MATH Google Scholar
Semple C, Steel MA (2003) Phylogenetics. Oxford University Press, Oxford
MATH Google Scholar
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40
Article Google Scholar
Steel MA, Penny D (1993) Distributions of tree comparison metrics—some new results. Syst Biol 42(2):126–141
Google Scholar
Tuzhilin AA (2016) Who invented the Gromov–Hausdorff distance? arXiv:1612.00728
Villar S, Bandeira AS, Blumberg AJ, Ward R (2016) A polynomial-time relaxation of the Gromov–Hausdorff distance. arXiv:1610.05214
Whidden C, Beiko RG, Zeh N (2016) Fixed-parameter and approximation algorithms for maximum agreement forests of multifurcating trees. Algorithmica 74(3):1019–1054
Article MathSciNet MATH Google Scholar
Williams WT, Clifford HT (1971) On the comparison of two classifications of the same set of elements. Taxon 20:519–522
Article Google Scholar
Zaretskii KA (1965) Constructing a tree on the basis of a set of distances between the hanging vertices (in Russian). Uspekhi Mat Nauk 20(6):90–92
MathSciNet Google Scholar

Download references

Acknowledgements

First of all, I have to thank Mareike Fischer for introducing me to the world of phylogenetic distances. She helped also a lot for getting a clear notation. Second, I’m very grateful to Jürgen Eichhorn who unconsciously draw my attention to metrics between metric spaces. Third, I’d like to thank Michelle Kendall for her inspiring talk at the Portobello conference 2015 and additional discussion later. Fourth, I thank Mike Steel for many interesting discussions, useful hints, his kind hospitality during my stay in Christchurch 2010, and for the organisation of the amazing 2015 workshop in Kaikoura with an inspiring and open atmosphere. Further, Miroslav Bačak, Andrew Francis, Alexander Gavryushkin, Stefan Grünewald, Marc Hellmuth and Giulio dalla Riva gave useful hints and inspiration in many discussions. The questions and hints of five anonymous referees regarding previous versions of this manuscript helped to improve it substantially.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Greifswald, 17487, Greifswald, Germany
Volkmar Liebscher

Authors

Volkmar Liebscher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Volkmar Liebscher.

A On Semimetric Extensions

Several times we met the problem whether a partial dissimilarity on X, i.e. a map $q:E\rightarrow \mathbb {R}_{\ge 0}$, $E\subseteq \left( {\begin{array}{c}X\\ 2\end{array}}\right) $, has an extension to a semimetric on X. This seems to be a well-known problem, one folklore solution I found in Guénoche et al. (2004). For our needs, the following reformulation proved more useful.

We call a cycle $p=x_0x_1\dots x_m$, $x_0=x_m$, in a graph (X, E) induced, if it is simple ($x_i$, $i=0,\dots ,m-1$, are different) and chordless ($\left\{ x_i,x_j\right\} \notin E$, $0\le i,j\le m-1$, $2\le \left|i-j\right|\le m-2 $).

Theorem 6

If the graph $G=(X,E)$ is connected, then $q:E\rightarrow \mathbb {R}_{\ge 0}$ extends to a semimetric on X if and only if for all induced cycles p of G and all edges e in p

$$\begin{aligned} 2q(e)\le \mathrm {len}(p). \end{aligned}$$

(17)

Proof

By Guénoche et al. (2004), Proposition 2.1, q has a semimetric extension if and only if for all $\left\{ x,y\right\} \in E$ $q(\left\{ x,y\right\} )=d^q_G(x,y)$. $d^q_G$ was introduced in (3).

Let there be an extension of q to a semimetric. Fix an induced cycle $p=x_0x_1\dots x_{m-1} x_m$, $x_m=x_0$, and the edge $e=\left\{ x_0,x_1\right\} $ in p. We obtain

$$\begin{aligned} q(\left\{ x_0,x_1\right\} )= & {} d^q_G(x_0,x_1)\le \mathrm {len}(x_1\dots x_{m-1}x_0)= \sum _{k=1}^{m-1}q(\left\{ x_k,x_{k+1}\right\} )\\ 2q(\left\{ x_0,x_1\right\} )\le & {} q(\left\{ x_0,x_1\right\} )+ \sum _{k=1}^{m-1}q(\left\{ x_k,x_{k+1}\right\} )=\mathrm {len}(p). \end{aligned}$$

Now assume (17) is fulfilled, but there is no extension to a semimetric. Thus, we find $\left\{ x,y\right\} \in E$ such that $q(\left\{ x,y\right\} )>d^q_G(x,y)$. This means there is a path $\tilde{p}=x_0x_1\dots x_{m-1}$, $x_0=x$, $x_{m-1}=y$, such that

$$\begin{aligned} q(\left\{ x_0,x_{m-1}\right\} )>\mathrm {len}(\tilde{p})=\sum _{k=0}^{m-2}q(\left\{ x_k,x_{k+1}\right\} ). \end{aligned}$$

We may assume w.l.o.g. that m is minimal. Thus, $x_i$, $i=0,\dots ,m-1$ are different. Setting $x_m=x_0$, $e=\left\{ x,y\right\} =\left\{ x_0,x_{m-1}\right\} $, the (simple) cycle $p=x_0x_1\dots x_m$ violates (17). Suppose now that p has a chord, say $\left\{ x_i,x_j\right\} $. Since m is minimal, we know

$$\begin{aligned} q\left( \left\{ x_i,x_j\right\} \right) \le \sum _{k=i}^{j-1}q(\left\{ x_k,x_{k+1}\right\} ) \end{aligned}$$

and

$$\begin{aligned} q(\left\{ x_0,x_{m-1}\right\} )\le \sum _{k=0}^{i-1}q(\left\{ x_k,x_{k+1}\right\} )+q\left( \left\{ x_i,x_j\right\} \right) +\sum _{k=j}^{m-2}q(\left\{ x_k,x_{k+1}\right\} ). \end{aligned}$$

Substituting the first inequality into the right hand side of the second one yields

$$\begin{aligned} q(\left\{ x_0,x_{m-1}\right\} )\le \sum _{k=0}^{m-1}q(\left\{ x_k,x_{k+1}\right\} ). \end{aligned}$$

This contradiction shows that p is an induced cycle and completes the proof. $\square $

We can use this result for the

Proof of Theorem 2

We apply Theorem 6 to $X\cup X'$, $E=\left( {\begin{array}{c}X\\ 2\end{array}}\right) \cup \left( {\begin{array}{c}X'\\ 2\end{array}}\right) \cup \left\{ \left\{ x,x'\right\} :x\in X\right\} $ and $q:E\rightarrow \mathbb {R}_{\ge 0}$ given by

$$\begin{aligned} q(\left\{ u,v\right\} )=\left\{ { \begin{array}{cl} \rho (u,v)&{}\quad u,v\in X\\ \rho '(x,y)&{}\quad u=x',v=y', x,y\in X\\ \delta _x&{}\quad u=x,v=x', x\in X \end{array}}\right. . \end{aligned}$$

Induced cycles in $(X\cup X',E)$ are either triangles in X, triangles in $X'$ or quadrangles $x,y,y',x',x$. For the two former, (17) is equivalent to the triangle inequalities for $\rho ,\rho '$. For the latter, (17) is the same as (5). $\square $

The following result was used in the proof of Theorem 1.

Lemma 7

Suppose X, Y, Z are disjoint sets and there are given $d_1\in M(X\cup Y)$ and $d_2\in M(Y\cup Z)$ such that $d_1|_{\left( {\begin{array}{c}Y\\ 2\end{array}}\right) }=d_2|_{\left( {\begin{array}{c}Y\\ 2\end{array}}\right) }$. Then, there exists a $d\in M(X\cup Y\cup Z)$ such that $d|_{\left( {\begin{array}{c}X\cup Y\\ 2\end{array}}\right) }=d_1$ and $d|_{\left( {\begin{array}{c}Y\cup Z\\ 2\end{array}}\right) }=d_2$.

Proof

Now we apply the theorem to the graph $\left( X\cup Y\cup Z,\left( {\begin{array}{c}X\cup Y\\ 2\end{array}}\right) \cup \left( {\begin{array}{c}Y\cup Z\\ 2\end{array}}\right) \right) $ with

$$\begin{aligned} q(\left\{ u,v\right\} )=\left\{ {\begin{array}{ll} d_1(u,v)&{}\quad u,v\in X\cup Y\\ d_2(u,v)&{}\quad u,v\in Y\cup Z \end{array}}\right. . \end{aligned}$$

Since both $X\cup Y$ and $Y\cup Z$ are complete in this graph, the only induced cycles are triangles. The triangle inequalities for $d_1,d_2$ show (17). $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liebscher, V. New Gromov-Inspired Metrics on Phylogenetic Tree Space. Bull Math Biol 80, 493–518 (2018). https://doi.org/10.1007/s11538-017-0385-z

Download citation

Received: 17 February 2017
Accepted: 19 December 2017
Published: 02 January 2018
Issue Date: March 2018
DOI: https://doi.org/10.1007/s11538-017-0385-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

New Gromov-Inspired Metrics on Phylogenetic Tree Space

Abstract

Access this article

Similar content being viewed by others

A partial order and cluster-similarity metric on rooted phylogenetic trees

Two results about the Sackin and Colless indices for phylogenetic trees and their shapes

Wald Space for Phylogenetic Trees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

A On Semimetric Extensions

Theorem 6

Proof

Proof of Theorem 2

Lemma 7

Proof

Rights and permissions

About this article

Cite this article

Keywords

Navigation

New Gromov-Inspired Metrics on Phylogenetic Tree Space

Abstract

Access this article

Similar content being viewed by others

A partial order and cluster-similarity metric on rooted phylogenetic trees

Two results about the Sackin and Colless indices for phylogenetic trees and their shapes

Wald Space for Phylogenetic Trees

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

A On Semimetric Extensions

A On Semimetric Extensions

Theorem 6

Proof

Proof of Theorem 2

Lemma 7

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation