Abstract
We present a new class of metrics for unrooted phylogenetic X-trees inspired by the Gromov–Hausdorff distance for (compact) metric spaces. These metrics can be efficiently computed by linear or quadratic programming. They are robust under NNI operations, too. The local behaviour of the metrics shows that they are different from any previously introduced metrics. The performance of the metrics is briefly analysed on random weighted and unweighted trees as well as random caterpillars.
Similar content being viewed by others
References
Agarwal PK, Fox K, Nath A, Sidiropoulos A, Wang Y (2015) Computing the Gromov–Hausdorff distance for metric trees. In: Elbassioni K, Makino K (eds) Algorithms and computation. Lecture Notes in Computer Science, vol 9472, pp 529–540. Springer, Berlin. arXiv:1509.05751
Allen BL, Steel M (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Ann Comb 5:1–15
Benner P, Bačak M, Bourguignon P-Y (2014) Point estimates in phylogenetic reconstructions. Bioinformatics 30:i534–i540
Berkelaar M et al (2015) lpSolve: Interface to “Lp_solve” v. 5.5 to solve linear/integer programs. R package version 5.6.13. https://CRAN.R-project.org/package=lpSolve
Bernstein DI (2017) L-infinity optimization to Bergman fans of matroids with an application to phylogenetics. arXiv:1702.05141
Bernstein DI, Long C (2017) L-infinity optimization to linear spaces and phylogenetic trees. arXiv:1702.05127
Billera LJ, Holmes SP, Vogtmann K (2001) Geometry of the space of phylogenetic trees. Adv Appl Math 27(4):733–767
Bogdanowicz D, Giaro K (2012) Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM Trans Comput Biol Bioinform 9(1):150–160
Bonet ML, St. John K (2010) On the complexity of uSPR distance. IEEE/ACM Trans Comput Biol Bioinform 7(3):572–576
Bourque M (1978) Arbres de Steiner et reseaux dont certains sommets sont a localisation variable. PhD thesis, Montreal
Brodal GS, Fagerberg R, Pedersen CNS (2001) Computing the quartet distance between evolutionary trees on time \({\rm O}(n\log ^2n)\). In: Proceedings of the 12th international symposium on algorithms and computation (ISAAC). Lecture Notes in Computer Science, vol 2223, pp 731–737. Springer
Buneman P (1971) The recovery of trees from measures of dissimilarity. In: Kendall DG, Tautu P (eds) Mathematics in the archeological and historical sciences. Edinburgh University Press, Edinburgh, pp 387–395
Buneman P (1974) A note on the metric properties of trees. J Comb Theory 17(1):48–50
Burago D, Burago Y, Ivanov S (2001) A course in metric geometry. Graduate studies in mathematics, vol 33. American Mathematical Society, Providence
Chakerian J, Holmes S (2017) Distory: distance between phylogenetic histories. R package version 1.4.3. http://CRAN.R-project.org/package=distory
Coons JI, Rusinko J (2016) A note on the path interval distance. J Theor Biol 398:145–149
Cristina J (2008) Gromov–Hausdorff convergence of metric spaces, Helsinki. http://www.helsinki.fi/~cristina/pdfs/gromovHausdorff.pdf. Accessed 2 Feb 2015
DasGupta B, He X, Jiang T, Li M, Tromp J, Zhang L (1997) On distances between phylogenetic trees. In: Proceedings of the eighth ACM/SIAM symposium discrete algorithms (SODA ’97), pp 427–436
Day WHE (1985) Optimal algorithms for comparing trees with labeled leaves. J Classif 2(1):7–28
Dress A (1984) Trees, tight extensions of metric spaces, and the cohomological dimension of certain groups: a note on combinatorial properties of metric spaces. Adv Math 53(3):321–402
Dress A, Holland B, Huber KT, Koolen J, Moulton V, Weyer-Menkoff J (2005) \(\Delta \)-additive and \(\Delta \)-ultra-additive maps, Gromov’s trees and the Farris transform. Discrete Appl Math 146:51–73
Edwards DA (1975) The structure of superspace. In: Stavrakas NM, Allen KR (eds) Studies in topology. Academic Press, New York, pp 121–133
Estabrook GF, McMorris FR, Meacham CA (1985) Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst Zool 34(2):193–200
Fischer M, Kelk S (2016) On the maximum parsimony distance between phylogenetic trees. Ann Comb 20(1):87–113
Gavryushkin A, Drummond A (2016) The space of ultrametric phylogenetic trees. J Theor Biol 403:197–208
Gromov M (1981) Groups of polynomial growth and expanding maps. Publ Math IHÉS 53:53–73
Guénoche A, Leclerc B, Makarenkov V (2004) On the extension of a partial metric to a tree metric. Discrete Math 276:229–248
Hoffman AJ, Kruskal J (2010) Introduction to integral boundary points of convex polyhedra. In: Jünger M et al (eds) 50 years of integer programming, 1958–2008. Springer, Berlin, pp 49–50
Huggins P, Owen M, Yoshida R (2012) First steps toward the geometry of cophylogeny. In: Hibi T (ed) Harmony of Gröbner bases and the modern industrial society. World Scientific, Singapore, pp 99–116
Isbell JR (1964) Six theorems about injective metric spaces. Commun Math Helv 39(1):65–76
Karmarkar N (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4(4):373–395
Kelk S, Fischer M (2017) On the complexity of computing MP distance between binary phylogenetic trees. Ann Comb 21(4):573–604
Kendall M, Colijn C (2016) Mapping phylogenetic trees to reveal distinct patterns of evolution. Mol Biol Evol 33(10):2735–2743
Lang U, Pavón M, Züst R (2013) Metric stability of trees and tight spans. Arch Math 101(1):91–100
Liebscher V (2015) gromovlab: Gromov–Hausdorff type distances for labeled metric spaces. R package version 0.7-6. http://CRAN.R-project.org/package=gromovlab
Lin Y, Rajan V, Moret BME (2012) A metric for phylogenetic trees based on matching. IEEE/ACM Trans Comput Biol Bioinform 9(4):1014–1022
Lin B, Sturmfels B, Tang X, Yoshida R (2017) Convexity in tree spaces. SIAM J Discrete Math 31(3):2015–2038
Mémoli F (2007) On the use of Gromov–Hausdorff distances for shape comparison. In: Symposium on point based graphics, Prague, Sept 2007
Moulton V, Wu T (2015) A parsimony-based metric for phylogenetic trees. Adv Appl Math 66:22–45
Nye TMW (2011) Principal components analysis in the space of phylogenetic trees. Ann Stat 39(5):2716–2739
Owen M, Provan J (2011) A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans Comput Biol Bioinform 8(1):2–13
Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20(2):289–290
Pardalos PM, Wolkowicz H (eds) (1994) Quadratic assignment and related problems. DIMACS series in discrete mathematics and theoretical computer science, vol 16. AMS, Providence, RI. Papers from the workshop held at Rutgers University, New Brunswick, New Jersey, May 20–21, 1993
Pattengale ND, Gottlieb EJ, Moret BM (2007) Efficiently computing the Robinson–Foulds metric. J Comput Biol 14(6):724–735
Penny D, Hendy MD (1985) The use of tree comparison metrics. Syst Biol 34(1):75–82
R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, version 3.4.3, Vienna, Austria. http://www.R-project.org/
Robinson DF (1971) Comparison of labeled trees with valency three. J Comb Theory 11:105–119
Robinson DF, Foulds LR (1979) Comparison of weighted labelled trees. In: Combinatorial mathematics VI. Lecture Notes in Mathematics, vol 748, pp 119–126. Springer, Berlin
Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53:131–147
Semple C, Steel MA (2003) Phylogenetics. Oxford University Press, Oxford
Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40
Steel MA, Penny D (1993) Distributions of tree comparison metrics—some new results. Syst Biol 42(2):126–141
Tuzhilin AA (2016) Who invented the Gromov–Hausdorff distance? arXiv:1612.00728
Villar S, Bandeira AS, Blumberg AJ, Ward R (2016) A polynomial-time relaxation of the Gromov–Hausdorff distance. arXiv:1610.05214
Whidden C, Beiko RG, Zeh N (2016) Fixed-parameter and approximation algorithms for maximum agreement forests of multifurcating trees. Algorithmica 74(3):1019–1054
Williams WT, Clifford HT (1971) On the comparison of two classifications of the same set of elements. Taxon 20:519–522
Zaretskii KA (1965) Constructing a tree on the basis of a set of distances between the hanging vertices (in Russian). Uspekhi Mat Nauk 20(6):90–92
Acknowledgements
First of all, I have to thank Mareike Fischer for introducing me to the world of phylogenetic distances. She helped also a lot for getting a clear notation. Second, I’m very grateful to Jürgen Eichhorn who unconsciously draw my attention to metrics between metric spaces. Third, I’d like to thank Michelle Kendall for her inspiring talk at the Portobello conference 2015 and additional discussion later. Fourth, I thank Mike Steel for many interesting discussions, useful hints, his kind hospitality during my stay in Christchurch 2010, and for the organisation of the amazing 2015 workshop in Kaikoura with an inspiring and open atmosphere. Further, Miroslav Bačak, Andrew Francis, Alexander Gavryushkin, Stefan Grünewald, Marc Hellmuth and Giulio dalla Riva gave useful hints and inspiration in many discussions. The questions and hints of five anonymous referees regarding previous versions of this manuscript helped to improve it substantially.
Author information
Authors and Affiliations
Corresponding author
A On Semimetric Extensions
A On Semimetric Extensions
Several times we met the problem whether a partial dissimilarity on X, i.e. a map \(q:E\rightarrow \mathbb {R}_{\ge 0}\), \(E\subseteq \left( {\begin{array}{c}X\\ 2\end{array}}\right) \), has an extension to a semimetric on X. This seems to be a well-known problem, one folklore solution I found in Guénoche et al. (2004). For our needs, the following reformulation proved more useful.
We call a cycle \(p=x_0x_1\dots x_m\), \(x_0=x_m\), in a graph (X, E) induced, if it is simple (\(x_i\), \(i=0,\dots ,m-1\), are different) and chordless (\(\left\{ x_i,x_j\right\} \notin E\), \(0\le i,j\le m-1\), \(2\le \left|i-j\right|\le m-2 \)).
Theorem 6
If the graph \(G=(X,E)\) is connected, then \(q:E\rightarrow \mathbb {R}_{\ge 0}\) extends to a semimetric on X if and only if for all induced cycles p of G and all edges e in p
Proof
By Guénoche et al. (2004), Proposition 2.1, q has a semimetric extension if and only if for all \(\left\{ x,y\right\} \in E\) \(q(\left\{ x,y\right\} )=d^q_G(x,y)\). \(d^q_G\) was introduced in (3).
Let there be an extension of q to a semimetric. Fix an induced cycle \(p=x_0x_1\dots x_{m-1} x_m\), \(x_m=x_0\), and the edge \(e=\left\{ x_0,x_1\right\} \) in p. We obtain
Now assume (17) is fulfilled, but there is no extension to a semimetric. Thus, we find \(\left\{ x,y\right\} \in E\) such that \(q(\left\{ x,y\right\} )>d^q_G(x,y)\). This means there is a path \(\tilde{p}=x_0x_1\dots x_{m-1}\), \(x_0=x\), \(x_{m-1}=y\), such that
We may assume w.l.o.g. that m is minimal. Thus, \(x_i\), \(i=0,\dots ,m-1\) are different. Setting \(x_m=x_0\), \(e=\left\{ x,y\right\} =\left\{ x_0,x_{m-1}\right\} \), the (simple) cycle \(p=x_0x_1\dots x_m\) violates (17). Suppose now that p has a chord, say \(\left\{ x_i,x_j\right\} \). Since m is minimal, we know
and
Substituting the first inequality into the right hand side of the second one yields
This contradiction shows that p is an induced cycle and completes the proof. \(\square \)
We can use this result for the
Proof of Theorem 2
We apply Theorem 6 to \(X\cup X'\), \(E=\left( {\begin{array}{c}X\\ 2\end{array}}\right) \cup \left( {\begin{array}{c}X'\\ 2\end{array}}\right) \cup \left\{ \left\{ x,x'\right\} :x\in X\right\} \) and \(q:E\rightarrow \mathbb {R}_{\ge 0}\) given by
Induced cycles in \((X\cup X',E)\) are either triangles in X, triangles in \(X'\) or quadrangles \(x,y,y',x',x\). For the two former, (17) is equivalent to the triangle inequalities for \(\rho ,\rho '\). For the latter, (17) is the same as (5). \(\square \)
The following result was used in the proof of Theorem 1.
Lemma 7
Suppose X, Y, Z are disjoint sets and there are given \(d_1\in M(X\cup Y)\) and \(d_2\in M(Y\cup Z)\) such that \(d_1|_{\left( {\begin{array}{c}Y\\ 2\end{array}}\right) }=d_2|_{\left( {\begin{array}{c}Y\\ 2\end{array}}\right) }\). Then, there exists a \(d\in M(X\cup Y\cup Z)\) such that \(d|_{\left( {\begin{array}{c}X\cup Y\\ 2\end{array}}\right) }=d_1\) and \(d|_{\left( {\begin{array}{c}Y\cup Z\\ 2\end{array}}\right) }=d_2\).
Proof
Now we apply the theorem to the graph \(\left( X\cup Y\cup Z,\left( {\begin{array}{c}X\cup Y\\ 2\end{array}}\right) \cup \left( {\begin{array}{c}Y\cup Z\\ 2\end{array}}\right) \right) \) with
Since both \(X\cup Y\) and \(Y\cup Z\) are complete in this graph, the only induced cycles are triangles. The triangle inequalities for \(d_1,d_2\) show (17). \(\square \)
Rights and permissions
About this article
Cite this article
Liebscher, V. New Gromov-Inspired Metrics on Phylogenetic Tree Space. Bull Math Biol 80, 493–518 (2018). https://doi.org/10.1007/s11538-017-0385-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11538-017-0385-z