Skip to main content

Advertisement

Log in

New Gromov-Inspired Metrics on Phylogenetic Tree Space

  • Original Article
  • Published:
Bulletin of Mathematical Biology Aims and scope Submit manuscript

Abstract

We present a new class of metrics for unrooted phylogenetic X-trees inspired by the Gromov–Hausdorff distance for (compact) metric spaces. These metrics can be efficiently computed by linear or quadratic programming. They are robust under NNI operations, too. The local behaviour of the metrics shows that they are different from any previously introduced metrics. The performance of the metrics is briefly analysed on random weighted and unweighted trees as well as random caterpillars.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  • Agarwal PK, Fox K, Nath A, Sidiropoulos A, Wang Y (2015) Computing the Gromov–Hausdorff distance for metric trees. In: Elbassioni K, Makino K (eds) Algorithms and computation. Lecture Notes in Computer Science, vol 9472, pp 529–540. Springer, Berlin. arXiv:1509.05751

  • Allen BL, Steel M (2001) Subtree transfer operations and their induced metrics on evolutionary trees. Ann Comb 5:1–15

    Article  MathSciNet  MATH  Google Scholar 

  • Benner P, Bačak M, Bourguignon P-Y (2014) Point estimates in phylogenetic reconstructions. Bioinformatics 30:i534–i540

    Article  Google Scholar 

  • Berkelaar M et al (2015) lpSolve: Interface to “Lp_solve” v. 5.5 to solve linear/integer programs. R package version 5.6.13. https://CRAN.R-project.org/package=lpSolve

  • Bernstein DI (2017) L-infinity optimization to Bergman fans of matroids with an application to phylogenetics. arXiv:1702.05141

  • Bernstein DI, Long C (2017) L-infinity optimization to linear spaces and phylogenetic trees. arXiv:1702.05127

  • Billera LJ, Holmes SP, Vogtmann K (2001) Geometry of the space of phylogenetic trees. Adv Appl Math 27(4):733–767

    Article  MathSciNet  MATH  Google Scholar 

  • Bogdanowicz D, Giaro K (2012) Matching split distance for unrooted binary phylogenetic trees. IEEE/ACM Trans Comput Biol Bioinform 9(1):150–160

    Article  Google Scholar 

  • Bonet ML, St. John K (2010) On the complexity of uSPR distance. IEEE/ACM Trans Comput Biol Bioinform 7(3):572–576

    Article  Google Scholar 

  • Bourque M (1978) Arbres de Steiner et reseaux dont certains sommets sont a localisation variable. PhD thesis, Montreal

  • Brodal GS, Fagerberg R, Pedersen CNS (2001) Computing the quartet distance between evolutionary trees on time \({\rm O}(n\log ^2n)\). In: Proceedings of the 12th international symposium on algorithms and computation (ISAAC). Lecture Notes in Computer Science, vol 2223, pp 731–737. Springer

  • Buneman P (1971) The recovery of trees from measures of dissimilarity. In: Kendall DG, Tautu P (eds) Mathematics in the archeological and historical sciences. Edinburgh University Press, Edinburgh, pp 387–395

    Google Scholar 

  • Buneman P (1974) A note on the metric properties of trees. J Comb Theory 17(1):48–50

    Article  MathSciNet  MATH  Google Scholar 

  • Burago D, Burago Y, Ivanov S (2001) A course in metric geometry. Graduate studies in mathematics, vol 33. American Mathematical Society, Providence

    MATH  Google Scholar 

  • Chakerian J, Holmes S (2017) Distory: distance between phylogenetic histories. R package version 1.4.3. http://CRAN.R-project.org/package=distory

  • Coons JI, Rusinko J (2016) A note on the path interval distance. J Theor Biol 398:145–149

    Article  MathSciNet  MATH  Google Scholar 

  • Cristina J (2008) Gromov–Hausdorff convergence of metric spaces, Helsinki. http://www.helsinki.fi/~cristina/pdfs/gromovHausdorff.pdf. Accessed 2 Feb 2015

  • DasGupta B, He X, Jiang T, Li M, Tromp J, Zhang L (1997) On distances between phylogenetic trees. In: Proceedings of the eighth ACM/SIAM symposium discrete algorithms (SODA ’97), pp 427–436

  • Day WHE (1985) Optimal algorithms for comparing trees with labeled leaves. J Classif 2(1):7–28

    Article  MathSciNet  MATH  Google Scholar 

  • Dress A (1984) Trees, tight extensions of metric spaces, and the cohomological dimension of certain groups: a note on combinatorial properties of metric spaces. Adv Math 53(3):321–402

    Article  MathSciNet  MATH  Google Scholar 

  • Dress A, Holland B, Huber KT, Koolen J, Moulton V, Weyer-Menkoff J (2005) \(\Delta \)-additive and \(\Delta \)-ultra-additive maps, Gromov’s trees and the Farris transform. Discrete Appl Math 146:51–73

    Article  MathSciNet  MATH  Google Scholar 

  • Edwards DA (1975) The structure of superspace. In: Stavrakas NM, Allen KR (eds) Studies in topology. Academic Press, New York, pp 121–133

    Chapter  Google Scholar 

  • Estabrook GF, McMorris FR, Meacham CA (1985) Comparison of undirected phylogenetic trees based on subtrees of four evolutionary units. Syst Zool 34(2):193–200

    Article  Google Scholar 

  • Fischer M, Kelk S (2016) On the maximum parsimony distance between phylogenetic trees. Ann Comb 20(1):87–113

    Article  MathSciNet  MATH  Google Scholar 

  • Gavryushkin A, Drummond A (2016) The space of ultrametric phylogenetic trees. J Theor Biol 403:197–208

    Article  MathSciNet  MATH  Google Scholar 

  • Gromov M (1981) Groups of polynomial growth and expanding maps. Publ Math IHÉS 53:53–73

    Article  MathSciNet  MATH  Google Scholar 

  • Guénoche A, Leclerc B, Makarenkov V (2004) On the extension of a partial metric to a tree metric. Discrete Math 276:229–248

    Article  MathSciNet  MATH  Google Scholar 

  • Hoffman AJ, Kruskal J (2010) Introduction to integral boundary points of convex polyhedra. In: Jünger M et al (eds) 50 years of integer programming, 1958–2008. Springer, Berlin, pp 49–50

    Google Scholar 

  • Huggins P, Owen M, Yoshida R (2012) First steps toward the geometry of cophylogeny. In: Hibi T (ed) Harmony of Gröbner bases and the modern industrial society. World Scientific, Singapore, pp 99–116

    Chapter  Google Scholar 

  • Isbell JR (1964) Six theorems about injective metric spaces. Commun Math Helv 39(1):65–76

    Article  MathSciNet  MATH  Google Scholar 

  • Karmarkar N (1984) A new polynomial-time algorithm for linear programming. Combinatorica 4(4):373–395

    Article  MathSciNet  MATH  Google Scholar 

  • Kelk S, Fischer M (2017) On the complexity of computing MP distance between binary phylogenetic trees. Ann Comb 21(4):573–604

    Article  MathSciNet  MATH  Google Scholar 

  • Kendall M, Colijn C (2016) Mapping phylogenetic trees to reveal distinct patterns of evolution. Mol Biol Evol 33(10):2735–2743

    Article  Google Scholar 

  • Lang U, Pavón M, Züst R (2013) Metric stability of trees and tight spans. Arch Math 101(1):91–100

    Article  MathSciNet  MATH  Google Scholar 

  • Liebscher V (2015) gromovlab: Gromov–Hausdorff type distances for labeled metric spaces. R package version 0.7-6. http://CRAN.R-project.org/package=gromovlab

  • Lin Y, Rajan V, Moret BME (2012) A metric for phylogenetic trees based on matching. IEEE/ACM Trans Comput Biol Bioinform 9(4):1014–1022

    Article  Google Scholar 

  • Lin B, Sturmfels B, Tang X, Yoshida R (2017) Convexity in tree spaces. SIAM J Discrete Math 31(3):2015–2038

    Article  MathSciNet  MATH  Google Scholar 

  • Mémoli F (2007) On the use of Gromov–Hausdorff distances for shape comparison. In: Symposium on point based graphics, Prague, Sept 2007

  • Moulton V, Wu T (2015) A parsimony-based metric for phylogenetic trees. Adv Appl Math 66:22–45

    Article  MathSciNet  MATH  Google Scholar 

  • Nye TMW (2011) Principal components analysis in the space of phylogenetic trees. Ann Stat 39(5):2716–2739

    Article  MathSciNet  MATH  Google Scholar 

  • Owen M, Provan J (2011) A fast algorithm for computing geodesic distances in tree space. IEEE/ACM Trans Comput Biol Bioinform 8(1):2–13

    Article  Google Scholar 

  • Paradis E, Claude J, Strimmer K (2004) APE: analyses of phylogenetics and evolution in R language. Bioinformatics 20(2):289–290

    Article  Google Scholar 

  • Pardalos PM, Wolkowicz H (eds) (1994) Quadratic assignment and related problems. DIMACS series in discrete mathematics and theoretical computer science, vol 16. AMS, Providence, RI. Papers from the workshop held at Rutgers University, New Brunswick, New Jersey, May 20–21, 1993

  • Pattengale ND, Gottlieb EJ, Moret BM (2007) Efficiently computing the Robinson–Foulds metric. J Comput Biol 14(6):724–735

    Article  MathSciNet  Google Scholar 

  • Penny D, Hendy MD (1985) The use of tree comparison metrics. Syst Biol 34(1):75–82

    Article  Google Scholar 

  • R Core Team (2017) R: a language and environment for statistical computing. R Foundation for Statistical Computing, version 3.4.3, Vienna, Austria. http://www.R-project.org/

  • Robinson DF (1971) Comparison of labeled trees with valency three. J Comb Theory 11:105–119

    Article  MathSciNet  Google Scholar 

  • Robinson DF, Foulds LR (1979) Comparison of weighted labelled trees. In: Combinatorial mathematics VI. Lecture Notes in Mathematics, vol 748, pp 119–126. Springer, Berlin

  • Robinson DF, Foulds LR (1981) Comparison of phylogenetic trees. Math Biosci 53:131–147

    Article  MathSciNet  MATH  Google Scholar 

  • Semple C, Steel MA (2003) Phylogenetics. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Sokal RR, Rohlf FJ (1962) The comparison of dendrograms by objective methods. Taxon 11:33–40

    Article  Google Scholar 

  • Steel MA, Penny D (1993) Distributions of tree comparison metrics—some new results. Syst Biol 42(2):126–141

    Google Scholar 

  • Tuzhilin AA (2016) Who invented the Gromov–Hausdorff distance? arXiv:1612.00728

  • Villar S, Bandeira AS, Blumberg AJ, Ward R (2016) A polynomial-time relaxation of the Gromov–Hausdorff distance. arXiv:1610.05214

  • Whidden C, Beiko RG, Zeh N (2016) Fixed-parameter and approximation algorithms for maximum agreement forests of multifurcating trees. Algorithmica 74(3):1019–1054

    Article  MathSciNet  MATH  Google Scholar 

  • Williams WT, Clifford HT (1971) On the comparison of two classifications of the same set of elements. Taxon 20:519–522

    Article  Google Scholar 

  • Zaretskii KA (1965) Constructing a tree on the basis of a set of distances between the hanging vertices (in Russian). Uspekhi Mat Nauk 20(6):90–92

    MathSciNet  Google Scholar 

Download references

Acknowledgements

First of all, I have to thank Mareike Fischer for introducing me to the world of phylogenetic distances. She helped also a lot for getting a clear notation. Second, I’m very grateful to Jürgen Eichhorn who unconsciously draw my attention to metrics between metric spaces. Third, I’d like to thank Michelle Kendall for her inspiring talk at the Portobello conference 2015 and additional discussion later. Fourth, I thank Mike Steel for many interesting discussions, useful hints, his kind hospitality during my stay in Christchurch 2010, and for the organisation of the amazing 2015 workshop in Kaikoura with an inspiring and open atmosphere. Further, Miroslav Bačak, Andrew Francis, Alexander Gavryushkin, Stefan Grünewald, Marc Hellmuth and Giulio dalla Riva gave useful hints and inspiration in many discussions. The questions and hints of five anonymous referees regarding previous versions of this manuscript helped to improve it substantially.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Volkmar Liebscher.

A On Semimetric Extensions

A On Semimetric Extensions

Several times we met the problem whether a partial dissimilarity on X, i.e. a map \(q:E\rightarrow \mathbb {R}_{\ge 0}\), \(E\subseteq \left( {\begin{array}{c}X\\ 2\end{array}}\right) \), has an extension to a semimetric on X. This seems to be a well-known problem, one folklore solution I found in Guénoche et al. (2004). For our needs, the following reformulation proved more useful.

We call a cycle \(p=x_0x_1\dots x_m\), \(x_0=x_m\), in a graph (XE) induced, if it is simple (\(x_i\), \(i=0,\dots ,m-1\), are different) and chordless (\(\left\{ x_i,x_j\right\} \notin E\), \(0\le i,j\le m-1\), \(2\le \left|i-j\right|\le m-2 \)).

Theorem 6

If the graph \(G=(X,E)\) is connected, then \(q:E\rightarrow \mathbb {R}_{\ge 0}\) extends to a semimetric on X if and only if for all induced cycles p of G and all edges e in p

$$\begin{aligned} 2q(e)\le \mathrm {len}(p). \end{aligned}$$
(17)

Proof

By Guénoche et al. (2004), Proposition 2.1, q has a semimetric extension if and only if for all \(\left\{ x,y\right\} \in E\) \(q(\left\{ x,y\right\} )=d^q_G(x,y)\). \(d^q_G\) was introduced in (3).

Let there be an extension of q to a semimetric. Fix an induced cycle \(p=x_0x_1\dots x_{m-1} x_m\), \(x_m=x_0\), and the edge \(e=\left\{ x_0,x_1\right\} \) in p. We obtain

$$\begin{aligned} q(\left\{ x_0,x_1\right\} )= & {} d^q_G(x_0,x_1)\le \mathrm {len}(x_1\dots x_{m-1}x_0)= \sum _{k=1}^{m-1}q(\left\{ x_k,x_{k+1}\right\} )\\ 2q(\left\{ x_0,x_1\right\} )\le & {} q(\left\{ x_0,x_1\right\} )+ \sum _{k=1}^{m-1}q(\left\{ x_k,x_{k+1}\right\} )=\mathrm {len}(p). \end{aligned}$$

Now assume (17) is fulfilled, but there is no extension to a semimetric. Thus, we find \(\left\{ x,y\right\} \in E\) such that \(q(\left\{ x,y\right\} )>d^q_G(x,y)\). This means there is a path \(\tilde{p}=x_0x_1\dots x_{m-1}\), \(x_0=x\), \(x_{m-1}=y\), such that

$$\begin{aligned} q(\left\{ x_0,x_{m-1}\right\} )>\mathrm {len}(\tilde{p})=\sum _{k=0}^{m-2}q(\left\{ x_k,x_{k+1}\right\} ). \end{aligned}$$

We may assume w.l.o.g. that m is minimal. Thus, \(x_i\), \(i=0,\dots ,m-1\) are different. Setting \(x_m=x_0\), \(e=\left\{ x,y\right\} =\left\{ x_0,x_{m-1}\right\} \), the (simple) cycle \(p=x_0x_1\dots x_m\) violates (17). Suppose now that p has a chord, say \(\left\{ x_i,x_j\right\} \). Since m is minimal, we know

$$\begin{aligned} q\left( \left\{ x_i,x_j\right\} \right) \le \sum _{k=i}^{j-1}q(\left\{ x_k,x_{k+1}\right\} ) \end{aligned}$$

and

$$\begin{aligned} q(\left\{ x_0,x_{m-1}\right\} )\le \sum _{k=0}^{i-1}q(\left\{ x_k,x_{k+1}\right\} )+q\left( \left\{ x_i,x_j\right\} \right) +\sum _{k=j}^{m-2}q(\left\{ x_k,x_{k+1}\right\} ). \end{aligned}$$

Substituting the first inequality into the right hand side of the second one yields

$$\begin{aligned} q(\left\{ x_0,x_{m-1}\right\} )\le \sum _{k=0}^{m-1}q(\left\{ x_k,x_{k+1}\right\} ). \end{aligned}$$

This contradiction shows that p is an induced cycle and completes the proof. \(\square \)

We can use this result for the

Proof of Theorem 2

We apply Theorem 6 to \(X\cup X'\), \(E=\left( {\begin{array}{c}X\\ 2\end{array}}\right) \cup \left( {\begin{array}{c}X'\\ 2\end{array}}\right) \cup \left\{ \left\{ x,x'\right\} :x\in X\right\} \) and \(q:E\rightarrow \mathbb {R}_{\ge 0}\) given by

$$\begin{aligned} q(\left\{ u,v\right\} )=\left\{ { \begin{array}{cl} \rho (u,v)&{}\quad u,v\in X\\ \rho '(x,y)&{}\quad u=x',v=y', x,y\in X\\ \delta _x&{}\quad u=x,v=x', x\in X \end{array}}\right. . \end{aligned}$$

Induced cycles in \((X\cup X',E)\) are either triangles in X, triangles in \(X'\) or quadrangles \(x,y,y',x',x\). For the two former, (17) is equivalent to the triangle inequalities for \(\rho ,\rho '\). For the latter, (17) is the same as (5). \(\square \)

The following result was used in the proof of Theorem 1.

Lemma 7

Suppose XYZ are disjoint sets and there are given \(d_1\in M(X\cup Y)\) and \(d_2\in M(Y\cup Z)\) such that \(d_1|_{\left( {\begin{array}{c}Y\\ 2\end{array}}\right) }=d_2|_{\left( {\begin{array}{c}Y\\ 2\end{array}}\right) }\). Then, there exists a \(d\in M(X\cup Y\cup Z)\) such that \(d|_{\left( {\begin{array}{c}X\cup Y\\ 2\end{array}}\right) }=d_1\) and \(d|_{\left( {\begin{array}{c}Y\cup Z\\ 2\end{array}}\right) }=d_2\).

Proof

Now we apply the theorem to the graph \(\left( X\cup Y\cup Z,\left( {\begin{array}{c}X\cup Y\\ 2\end{array}}\right) \cup \left( {\begin{array}{c}Y\cup Z\\ 2\end{array}}\right) \right) \) with

$$\begin{aligned} q(\left\{ u,v\right\} )=\left\{ {\begin{array}{ll} d_1(u,v)&{}\quad u,v\in X\cup Y\\ d_2(u,v)&{}\quad u,v\in Y\cup Z \end{array}}\right. . \end{aligned}$$

Since both \(X\cup Y\) and \(Y\cup Z\) are complete in this graph, the only induced cycles are triangles. The triangle inequalities for \(d_1,d_2\) show (17). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liebscher, V. New Gromov-Inspired Metrics on Phylogenetic Tree Space. Bull Math Biol 80, 493–518 (2018). https://doi.org/10.1007/s11538-017-0385-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11538-017-0385-z

Keywords

Navigation